[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
tru wrote: @david-arm Should this be merged? https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/19.x: [compiler-rt] [test] Remove an unintended grep parameter (PR #116774)
https://github.com/tru updated https://github.com/llvm/llvm-project/pull/116774 >From fb6b195cae03ba6e5b50870031d710ca6886c5bb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Martin=20Storsj=C3=B6?= Date: Sun, 20 Oct 2024 13:51:50 +0300 Subject: [PATCH] [compiler-rt] [test] Remove an unintended grep parameter This parameter seems unintentional here; we're trying to grep the input on stdin, from the earlier stage in the pipeline. Since a recent update on Github Actions runners, the previous form (grepping a file, while piping in data on stdin) would fail running the test, with the test runner Python script throwing an exception when evaluating it: File "D:\a\llvm-mingw\llvm-mingw\llvm-project\llvm\utils\lit\lit\TestRunner.py", line 935, in _executeShCmd out = procs[i].stdout.read() ^^ File "C:\hostedtoolcache\windows\Python\3.12.7\x64\Lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] ^^^ TypeError: a bytes-like object is required, not 'NoneType' (cherry picked from commit c2717a89b8437d041d532c7b2c535ca4f4b35872) --- compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp b/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp index 9277fe0b235160..38e99cf6859451 100644 --- a/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp +++ b/compiler-rt/test/asan/TestCases/Windows/delay_dbghelp.cpp @@ -9,7 +9,7 @@ // static build, there won't be any clang_rt DLLs. // RUN: not grep cl""ang_rt %t || \ // RUN: grep cl""ang_rt %t | xargs which | \ -// RUN: xargs llvm-readobj --coff-imports | not grep dbghelp.dll %t +// RUN: xargs llvm-readobj --coff-imports | not grep dbghelp.dll extern "C" int puts(const char *); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987) (PR #117154)
david-arm wrote: > @david-arm Should this be merged? Hi yes I think it should be merged. It's a fairly serious bug fix. https://github.com/llvm/llvm-project/pull/117154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] [llvm] release/19.x: [MC][LoongArch] Change default cpu in `MCSubtargetInfo`. (#114922) (PR #117105)
heiher wrote: > Can you squash this PR so it's just one commit? Sure, it's done now. https://github.com/llvm/llvm-project/pull/117105 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 3d12f45 - [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075)
Author: Yingwei Zheng Date: 2024-11-25T09:36:43+01:00 New Revision: 3d12f45e50b68ac908ef05571e5cc52f4b966d94 URL: https://github.com/llvm/llvm-project/commit/3d12f45e50b68ac908ef05571e5cc52f4b966d94 DIFF: https://github.com/llvm/llvm-project/commit/3d12f45e50b68ac908ef05571e5cc52f4b966d94.diff LOG: [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075) On loongarch64 with lsx extension, we select `VBITREV_W` for `v4i32 (xor X, (shl splat(1), Y))`: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L1583-L1584 And `vsplat_imm_eq_1` is defined as: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td#L77-L87 For the `(bitconvert (v4i32 (build_vector)))` case, the pattern is expected to be: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (bitconvert:{ *:[v4i32] } (build_vector:{ *:[v4i32] }))<>, v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` However, `simplifyTree` drops the `bitconvert` node and its predicates: https://github.com/llvm/llvm-project/blob/8e6630391699116641cf390a10476295b7d4b95c/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp#L3036-L3062 Then llvm will match `vsplat_imm_eq_1` for any v4i32 splats and cause a miscompilation: ``` PATTERN: (xor:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, (shl:{ *:[v4i32] } (build_vector:{ *:[v4i32] }), v4i32:{ *:[v4i32] }:$vk)) RESULT: (VBITREV_W:{ *:[v4i32] } v4i32:{ *:[v4i32] }:$vj, v4i32:{ *:[v4i32] }:$vk) ``` This patch adds additional checks for predicates associated with the trivial bitconvert node. Unused patterns in the LoongArch target are also removed. Fixes https://github.com/llvm/llvm-project/issues/116008. (cherry picked from commit c727b48287cc96888f9e262f23d53cf635cf3b3d) Added: llvm/test/CodeGen/LoongArch/lsx/pr116008.ll Modified: llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp Removed: diff --git a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td index 0580683c3ce303..0233baecf6dd9c 100644 --- a/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td +++ b/llvm/lib/Target/LoongArch/LoongArchLSXInstrInfo.td @@ -67,8 +67,7 @@ class VecCondgetValueType(0).getVectorElementType(); @@ -109,8 +108,7 @@ def vsplati32_imm_eq_31 : PatFrags<(ops), [(build_vector)], [{ return selectVSplat(N, Imm, EltTy.getSizeInBits()) && Imm.getBitWidth() == EltTy.getSizeInBits() && Imm == 31; }]>; -def vsplati64_imm_eq_63 : PatFrags<(ops), [(build_vector), - (bitconvert (v4i32 (build_vector)))], [{ +def vsplati64_imm_eq_63 : PatFrags<(ops), [(build_vector)], [{ APInt Imm; EVT EltTy = N->getValueType(0).getVectorElementType(); diff --git a/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll b/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll new file mode 100644 index 00..ba8ffc34931893 --- /dev/null +++ b/llvm/test/CodeGen/LoongArch/lsx/pr116008.ll @@ -0,0 +1,17 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc --mtriple=loongarch64 --mattr=+lsx < %s | FileCheck %s + +define <4 x i32> @xor_shl_splat_vec_one(i32 %x, <4 x i32> %y) nounwind { +; CHECK-LABEL: xor_shl_splat_vec_one: +; CHECK: # %bb.0: # %entry +; CHECK-NEXT:vreplgr2vr.w $vr1, $a0 +; CHECK-NEXT:vsll.w $vr0, $vr1, $vr0 +; CHECK-NEXT:vbitrevi.w $vr0, $vr0, 0 +; CHECK-NEXT:ret +entry: + %ins = insertelement <4 x i32> poison, i32 %x, i64 0 + %splat = shufflevector <4 x i32> %ins, <4 x i32> poison, <4 x i32> zeroinitializer + %shl = shl <4 x i32> %splat, %y + %xor = xor <4 x i32> %shl, splat (i32 1) + ret <4 x i32> %xor +} diff --git a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp index a8cecca0d4a54f..ca71569008d5ec 100644 --- a/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp +++ b/llvm/utils/TableGen/Common/CodeGenDAGPatterns.cpp @@ -3042,6 +3042,14 @@ static bool SimplifyTree(TreePatternNodePtr &N) { !N->getExtType(0).empty() && N->getExtType(0) == N->getChild(0).getExtType(0) && N->getName().empty()) { +if (!N->getPredicateCalls().empty()) { + std::string Str; + raw_string_ostream OS(Str); + OS << *N + << "\n trivial bitconvert node should not have predicate calls\n"; + PrintFatalError(Str); + return false; +} N = N->getChildShared(0); SimplifyTree(N); return true; ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm
[llvm-branch-commits] [llvm] release/19.x: [SDAG][ISel][TableGen][LoongArch] Report error for trivial bitcasts when there are predicate calls (#116075) (PR #116797)
github-actions[bot] wrote: @llvmbot Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our [build bots](https://lab.llvm.org/buildbot/). If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail [here](https://llvm.org/docs/MyFirstTypoFix.html#myfirsttypofix-issues-after-landing-your-pr). If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of [LLVM development](https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy). You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! https://github.com/llvm/llvm-project/pull/116797 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] f9ae37c - [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794)
Author: Yingwei Zheng Date: 2024-11-25T09:37:30+01:00 New Revision: f9ae37c670d4bcf4713278ac94d2c8991a326f9e URL: https://github.com/llvm/llvm-project/commit/f9ae37c670d4bcf4713278ac94d2c8991a326f9e DIFF: https://github.com/llvm/llvm-project/commit/f9ae37c670d4bcf4713278ac94d2c8991a326f9e.diff LOG: [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794) Closes https://github.com/llvm/llvm-project/issues/116775. (cherry picked from commit 03d8831fa8ef5b7e32172c718b550a454645faea) Added: Modified: llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp llvm/test/Transforms/InstCombine/ptrmask.ll Removed: diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp index 8a6ec3076ac621..b9d06b59368508 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp @@ -1004,7 +1004,7 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I, uint64_t MaskedGEPIndex = HighBitsGEPIndex | MaskedLowBitsGEPIndex; if (MaskedGEPIndex != GEPIndex) { - auto *GEP = cast(II->getArgOperand(0)); + auto *GEP = cast(II->getArgOperand(0)); Builder.SetInsertPoint(I); Type *GEPIndexType = DL.getIndexType(GEP->getPointerOperand()->getType()); diff --git a/llvm/test/Transforms/InstCombine/ptrmask.ll b/llvm/test/Transforms/InstCombine/ptrmask.ll index 4631b81cd1ce1f..cd998bac3f9f0d 100644 --- a/llvm/test/Transforms/InstCombine/ptrmask.ll +++ b/llvm/test/Transforms/InstCombine/ptrmask.ll @@ -578,3 +578,16 @@ define ptr @ptrmask_is_useless_fail1(i64 %i, i64 %m) { %r = call ptr @llvm.ptrmask.p0.i64(ptr %p0, i64 %m0) ret ptr %r } + +@GC_arrays = external global { i8, i8, i64 } + +define ptr @ptrmask_demandedbits_constantexpr() { +; CHECK-LABEL: define ptr @ptrmask_demandedbits_constantexpr() { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[ALIGNED_RESULT:%.*]] = call align 8 ptr @llvm.ptrmask.p0.i64(ptr nonnull @GC_arrays, i64 -8) +; CHECK-NEXT:ret ptr [[ALIGNED_RESULT]] +; +entry: + %aligned_result = call ptr @llvm.ptrmask.p0.i64(ptr getelementptr inbounds (i8, ptr @GC_arrays, i64 1), i64 -8) + ret ptr %aligned_result +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794) (PR #116814)
https://github.com/tru updated https://github.com/llvm/llvm-project/pull/116814 >From f9ae37c670d4bcf4713278ac94d2c8991a326f9e Mon Sep 17 00:00:00 2001 From: Yingwei Zheng Date: Tue, 19 Nov 2024 22:17:24 +0800 Subject: [PATCH] [InstCombine] Handle constant GEP expr in `SimplifyDemandedUseBits` (#116794) Closes https://github.com/llvm/llvm-project/issues/116775. (cherry picked from commit 03d8831fa8ef5b7e32172c718b550a454645faea) --- .../InstCombine/InstCombineSimplifyDemanded.cpp | 2 +- llvm/test/Transforms/InstCombine/ptrmask.ll | 13 + 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp index 8a6ec3076ac621..b9d06b59368508 100644 --- a/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp +++ b/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp @@ -1004,7 +1004,7 @@ Value *InstCombinerImpl::SimplifyDemandedUseBits(Instruction *I, uint64_t MaskedGEPIndex = HighBitsGEPIndex | MaskedLowBitsGEPIndex; if (MaskedGEPIndex != GEPIndex) { - auto *GEP = cast(II->getArgOperand(0)); + auto *GEP = cast(II->getArgOperand(0)); Builder.SetInsertPoint(I); Type *GEPIndexType = DL.getIndexType(GEP->getPointerOperand()->getType()); diff --git a/llvm/test/Transforms/InstCombine/ptrmask.ll b/llvm/test/Transforms/InstCombine/ptrmask.ll index 4631b81cd1ce1f..cd998bac3f9f0d 100644 --- a/llvm/test/Transforms/InstCombine/ptrmask.ll +++ b/llvm/test/Transforms/InstCombine/ptrmask.ll @@ -578,3 +578,16 @@ define ptr @ptrmask_is_useless_fail1(i64 %i, i64 %m) { %r = call ptr @llvm.ptrmask.p0.i64(ptr %p0, i64 %m0) ret ptr %r } + +@GC_arrays = external global { i8, i8, i64 } + +define ptr @ptrmask_demandedbits_constantexpr() { +; CHECK-LABEL: define ptr @ptrmask_demandedbits_constantexpr() { +; CHECK-NEXT: entry: +; CHECK-NEXT:[[ALIGNED_RESULT:%.*]] = call align 8 ptr @llvm.ptrmask.p0.i64(ptr nonnull @GC_arrays, i64 -8) +; CHECK-NEXT:ret ptr [[ALIGNED_RESULT]] +; +entry: + %aligned_result = call ptr @llvm.ptrmask.p0.i64(ptr getelementptr inbounds (i8, ptr @GC_arrays, i64 1), i64 -8) + ret ptr %aligned_result +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LICM] allow MemoryAccess creation failure (#116813) (PR #117082)
github-actions[bot] wrote: @DianQK (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/117082 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/tru closed https://github.com/llvm/llvm-project/pull/117134 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 336f877 - [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model
Author: wanglei Date: 2024-11-25T09:45:06+01:00 New Revision: 336f87753b510aed840daf87f8d3a4996e6c8f15 URL: https://github.com/llvm/llvm-project/commit/336f87753b510aed840daf87f8d3a4996e6c8f15 DIFF: https://github.com/llvm/llvm-project/commit/336f87753b510aed840daf87f8d3a4996e6c8f15.diff LOG: [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Cherry-picked from #117099, used for fix linker errors when bulding shared libraries with large code model. Added: Modified: llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp llvm/test/CodeGen/LoongArch/code-models.ll llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll llvm/test/CodeGen/LoongArch/psabi-restricted-scheduling.ll llvm/test/CodeGen/LoongArch/tls-models.ll Removed: diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp index c136f5b3e515d7..e680dda7374d07 100644 --- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp @@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL( IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL; Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1; -bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal(); +bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT; unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : LoongArchII::MO_PCREL_LO; unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D; expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg, diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll b/llvm/test/CodeGen/LoongArch/code-models.ll index 4b2b72afaee171..4eb1e5e596fd3f 100644 --- a/llvm/test/CodeGen/LoongArch/code-models.ll +++ b/llvm/test/CodeGen/LoongArch/code-models.ll @@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) { ; LARGE-NEXT:.cfi_offset 1, -8 ; LARGE-NEXT:ori $a2, $zero, 1000 ; LARGE-NEXT:move $a1, $zero -; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset) -; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset) -; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset) -; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset) -; LARGE-NEXT:add.d $ra, $t8, $ra +; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset) +; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset) +; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset) +; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset) +; LARGE-NEXT:ldx.d $ra, $t8, $ra ; LARGE-NEXT:jirl $ra, $ra, 0 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload ; LARGE-NEXT:addi.d $sp, $sp, 16 diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll index ed1a24e82b4e46..29348fe0d641ed 100644 --- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll +++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll @@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) { ; LA64LARGE-NEXT: .LBB3_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr) +; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr) +; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr) +; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra ; LA64LARGE-NEXT:jirl $ra, $ra, 0 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1 @@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind { ; LA64LARGE-NEXT: .LBB5_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-N
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
https://github.com/tru updated https://github.com/llvm/llvm-project/pull/117134 >From 336f87753b510aed840daf87f8d3a4996e6c8f15 Mon Sep 17 00:00:00 2001 From: wanglei Date: Thu, 21 Nov 2024 09:31:12 +0800 Subject: [PATCH] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model This commit fixes an issue in the large code model where non-dso_local function calls did not use the GOT as expected in PIC mode. Instead, direct PC-relative access was incorrectly applied, leading to linker errors when building shared libraries. For `ExternalSymbol`, it is not possible to determine whether it is dso_local during pseudo-instruction expansion. We use target flags to differentiate whether GOT should be used. Cherry-picked from #117099, used for fix linker errors when bulding shared libraries with large code model. --- .../LoongArch/LoongArchExpandPseudoInsts.cpp | 2 +- llvm/test/CodeGen/LoongArch/code-models.ll| 10 ++--- .../LoongArch/machinelicm-address-pseudos.ll | 20 +- .../LoongArch/psabi-restricted-scheduling.ll | 40 +-- llvm/test/CodeGen/LoongArch/tls-models.ll | 20 +- 5 files changed, 46 insertions(+), 46 deletions(-) diff --git a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp index c136f5b3e515d7..e680dda7374d07 100644 --- a/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp +++ b/llvm/lib/Target/LoongArch/LoongArchExpandPseudoInsts.cpp @@ -721,7 +721,7 @@ bool LoongArchExpandPseudo::expandFunctionCALL( IsTailCall ? LoongArch::PseudoJIRL_TAIL : LoongArch::PseudoJIRL_CALL; Register AddrReg = IsTailCall ? LoongArch::R19 : LoongArch::R1; -bool UseGOT = Func.isGlobal() && !Func.getGlobal()->isDSOLocal(); +bool UseGOT = Func.getTargetFlags() == LoongArchII::MO_CALL_PLT; unsigned MO = UseGOT ? LoongArchII::MO_GOT_PC_HI : LoongArchII::MO_PCREL_LO; unsigned LAOpcode = UseGOT ? LoongArch::LDX_D : LoongArch::ADD_D; expandLargeAddressLoad(MBB, MBBI, NextMBBI, LAOpcode, MO, Func, AddrReg, diff --git a/llvm/test/CodeGen/LoongArch/code-models.ll b/llvm/test/CodeGen/LoongArch/code-models.ll index 4b2b72afaee171..4eb1e5e596fd3f 100644 --- a/llvm/test/CodeGen/LoongArch/code-models.ll +++ b/llvm/test/CodeGen/LoongArch/code-models.ll @@ -82,11 +82,11 @@ define void @call_external_sym(ptr %dst) { ; LARGE-NEXT:.cfi_offset 1, -8 ; LARGE-NEXT:ori $a2, $zero, 1000 ; LARGE-NEXT:move $a1, $zero -; LARGE-NEXT:pcalau12i $ra, %pc_hi20(memset) -; LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(memset) -; LARGE-NEXT:lu32i.d $t8, %pc64_lo20(memset) -; LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(memset) -; LARGE-NEXT:add.d $ra, $t8, $ra +; LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(memset) +; LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(memset) +; LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(memset) +; LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(memset) +; LARGE-NEXT:ldx.d $ra, $t8, $ra ; LARGE-NEXT:jirl $ra, $ra, 0 ; LARGE-NEXT:ld.d $ra, $sp, 8 # 8-byte Folded Reload ; LARGE-NEXT:addi.d $sp, $sp, 16 diff --git a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll index ed1a24e82b4e46..29348fe0d641ed 100644 --- a/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll +++ b/llvm/test/CodeGen/LoongArch/machinelicm-address-pseudos.ll @@ -282,11 +282,11 @@ define void @test_la_tls_ld(i32 signext %n) { ; LA64LARGE-NEXT: .LBB3_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:addi.d $t8, $zero, %got_pc_lo12(__tls_get_addr) +; LA64LARGE-NEXT:lu32i.d $t8, %got64_pc_lo20(__tls_get_addr) +; LA64LARGE-NEXT:lu52i.d $t8, $t8, %got64_pc_hi12(__tls_get_addr) +; LA64LARGE-NEXT:ldx.d $ra, $t8, $ra ; LA64LARGE-NEXT:jirl $ra, $ra, 0 ; LA64LARGE-NEXT:ld.w $zero, $a0, 0 ; LA64LARGE-NEXT:addi.w $s1, $s1, 1 @@ -448,11 +448,11 @@ define void @test_la_tls_gd(i32 signext %n) nounwind { ; LA64LARGE-NEXT: .LBB5_1: # %loop ; LA64LARGE-NEXT:# =>This Inner Loop Header: Depth=1 ; LA64LARGE-NEXT:move $a0, $s0 -; LA64LARGE-NEXT:pcalau12i $ra, %pc_hi20(__tls_get_addr) -; LA64LARGE-NEXT:addi.d $t8, $zero, %pc_lo12(__tls_get_addr) -; LA64LARGE-NEXT:lu32i.d $t8, %pc64_lo20(__tls_get_addr) -; LA64LARGE-NEXT:lu52i.d $t8, $t8, %pc64_hi12(__tls_get_addr) -; LA64LARGE-NEXT:add.d $ra, $t8, $ra +; LA64LARGE-NEXT:pcalau12i $ra, %got_pc_hi20(__tls_get_addr) +; LA64LARGE-NEXT:a
[llvm-branch-commits] [llvm] [LoongArch] Fix GOT usage for `non-dso_local` function calls in large code model (PR #117134)
github-actions[bot] wrote: @wangleiat (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/117134 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [SCEV] Fix sext handling for `getConstantMultiple` (#117093) (PR #117136)
https://github.com/tru closed https://github.com/llvm/llvm-project/pull/117136 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] release/19.x: [compiler-rt] [test] Remove an unintended grep parameter (PR #116774)
https://github.com/tru closed https://github.com/llvm/llvm-project/pull/116774 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang] handle fir.call in AliasAnalysis::getModRef (PR #117164)
@@ -329,14 +341,92 @@ AliasResult AliasAnalysis::alias(Source lhsSrc, Source rhsSrc, mlir::Value lhs, // AliasAnalysis: getModRef //===--===// +static bool isSavedLocal(const fir::AliasAnalysis::Source &src) { + if (auto symRef = llvm::dyn_cast(src.origin.u)) { +auto [nameKind, deconstruct] = +fir::NameUniquer::deconstruct(symRef.getLeafReference().getValue()); +return nameKind == fir::NameUniquer::NameKind::VARIABLE && + !deconstruct.procs.empty(); + } + return false; +} + +static bool isCallToFortranUserProcedure(fir::CallOp call) { + // TODO: indirect calls are excluded by these checks. Maybe some attribute is + // needed to flag user calls in this case. + if (fir::hasBindcAttr(call)) +return true; + if (std::optional callee = call.getCallee()) +return fir::NameUniquer::deconstruct(callee->getLeafReference().getValue()) + .first == fir::NameUniquer::NameKind::PROCEDURE; + return false; +} + +static ModRefResult getCallModRef(fir::CallOp call, mlir::Value var) { + // TODO: limit to Fortran functions?? + // 1. Detect variables that can be accessed indirectly. + fir::AliasAnalysis aliasAnalysis; + fir::AliasAnalysis::Source varSrc = aliasAnalysis.getSource(var); + // If the variable is not a user variable, we cannot safely assume that + // Fortran semantics apply (e.g., a bare alloca/allocmem result may very well + // be placed in an allocatable/pointer descriptor and escape). + + // All the logic bellows are based on Fortran semantics and only holds if this + // is a call to a procedure form the Fortran source and this is a variable + // from the Fortran source. Compiler generated temporaries or functions may + // not adhere to this semantic. + // TODO: add some opt-in or op-out mechanism for compiler generated temps. + // An example of something currently problematic is the allocmem generated for + // ALLOCATE of allocatable target. It currently does not have the target + // attribute, which would lead this analysis to believe it cannot escape. + if (!varSrc.isFortranUserVariable() || !isCallToFortranUserProcedure(call)) +return ModRefResult::getModAndRef(); + // Pointer and target may have been captured. + if (varSrc.isTargetOrPointer()) +return ModRefResult::getModAndRef(); + // Host associated variables may be addressed indirectly via an internal + // function call, whether the call is in the parent or an internal procedure. + // Note that the host associated/internal procedure may be referenced + // indirectly inside calls to non internal procedure. This is because internal + // procedures may be captured or passed. As this is tricky to analyze, always + // consider such variables may be accessed in any calls. + if (varSrc.kind == fir::AliasAnalysis::SourceKind::HostAssoc || + varSrc.isCapturedInInternalProcedure) +return ModRefResult::getModAndRef(); + // At that stage, it has been ruled out that local (including the saved ones) + // and dummy cannot be indirectly accessed in the call. + if (varSrc.kind != fir::AliasAnalysis::SourceKind::Allocate && + !varSrc.isDummyArgument()) { +if (varSrc.kind != fir::AliasAnalysis::SourceKind::Global || +!isSavedLocal(varSrc)) + return ModRefResult::getModAndRef(); + } + // 2. Check if the variable is passed via the arguments. + for (auto arg : call.getArgs()) { +if (fir::conformsWithPassByRef(arg.getType()) && +!aliasAnalysis.alias(arg, var).isNo()) { + // TODO: intent(in) would allow returning Ref here. This can be obtained + // in the func.func attributes for direct calls, but the module lookup is + // linear with the number of MLIR symbols, which would introduce a pseudo + // quadratic behavior num_calls * num_func. tblah wrote: That sounds great! https://github.com/llvm/llvm-project/pull/117164 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [MLIR][OpenMP] Add Lowering support for OpenMP Declare Mapper directive (PR #117046)
@@ -2701,7 +2701,42 @@ static void genOMP(lower::AbstractConverter &converter, lower::SymMap &symTable, semantics::SemanticsContext &semaCtx, lower::pft::Evaluation &eval, const parser::OpenMPDeclareMapperConstruct &declareMapperConstruct) { - TODO(converter.getCurrentLocation(), "OpenMPDeclareMapperConstruct"); + fir::FirOpBuilder &firOpBuilder = converter.getFirOpBuilder(); + lower::StatementContext stmtCtx; + const auto &spec = + std::get(declareMapperConstruct.t); + const auto &mapperName{std::get>(spec.t)}; + const auto &varType{std::get(spec.t)}; + const auto &varName{std::get(spec.t)}; + assert(varType.declTypeSpec->category() == + semantics::DeclTypeSpec::Category::TypeDerived && + "Expected derived type"); + + std::string mapperNameStr; + if (mapperName.has_value()) +mapperNameStr = mapperName->ToString(); + else +mapperNameStr = +"default_" + varType.declTypeSpec->derivedTypeSpec().name().ToString(); + + mlir::OpBuilder::InsertPoint insPt = firOpBuilder.saveInsertionPoint(); + firOpBuilder.setInsertionPointToStart(converter.getModuleOp().getBody()); + auto mlirType = converter.genType(varType.declTypeSpec->derivedTypeSpec()); + auto varVal = firOpBuilder.createTemporaryAlloc( + converter.getCurrentLocation(), mlirType, varName.ToString()); tblah wrote: Sorry I didn't notice this before. So far as I understand, this will create the `fir.alloca` and `hlfir.declare` at the beginning of the MLIR module, not nested in any intermediate operation. How do you intend to lower this to LLVMIR? We would normally nest these in some kind of "function-like" wrapper operation e.g. `func.func` `fir.global` `omp.private` etc. I wonder if the declare mapper operation needs a nested region for this allocation (like we do for `omp.private` and `omp.declare_reduction`). https://github.com/llvm/llvm-project/pull/117046 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_f32_[fp|bf]8 of gfx950. (PR #117383)
arsenm wrote: ### Merge activity * **Nov 25, 12:19 PM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117383). https://github.com/llvm/llvm-project/pull/117383 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117287 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117287 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)
@@ -2551,8 +2551,34 @@ int GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) { return isVCmpXWritesExec(*TII, *TRI, MI); }; - const int NumWaitStates = 4; - return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, NumWaitStates); + auto IsVALUFn = [](const MachineInstr &MI) { +return SIInstrInfo::isVALU(MI); + }; + + const int VCmpXWritesExecWaitStates = 4; + const int VALUWritesVDstWaitStates = 2; + int WaitStatesNeeded = 0; + + for (const MachineOperand &Op : MI->explicit_uses()) { +if (!Op.isReg() || !TRI->isVGPR(MF.getRegInfo(), Op.getReg())) + continue; +Register Reg = Op.getReg(); + +int WaitStatesSinceDef = +VALUWritesVDstWaitStates - +getWaitStatesSinceDef(Reg, IsVALUFn, + /*MaxWaitStates=*/VALUWritesVDstWaitStates); arsenm wrote: The usage doesn't exactly map to the definition name though https://github.com/llvm/llvm-project/pull/117287 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scale_[f16|f32]_fp8 of gfx950. (PR #117380)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117380 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (PR #117379)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scale_[f16|f32]_fp8 of gfx950. (PR #117380)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117380 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add MC support for gfx950 V_BITOP3_B32/B16 (PR #117379)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117379 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{fp8|bf8}_f32 of gfx950. (PR #117382)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117382 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for load transpose instructions for gfx950 (PR #117378)
arsenm wrote: ### Merge activity * **Nov 25, 12:19 PM EST**: A user started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/117378). https://github.com/llvm/llvm-project/pull/117378 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Co-authored-by: Sirish Pande--- Patch is 23.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117599.diff 10 Files Affected: - (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+3-3) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+2-2) - (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+1) - (modified) llvm/lib/Target/AMDGPU/BUFInstructions.td (+2) - (modified) llvm/lib/TargetParser/TargetParser.cpp (+2) - (added) llvm/test/CodeGen/AMDGPU/fp-atomics-gfx950.ll (+92) - (modified) llvm/test/MC/AMDGPU/gfx12_asm_vbuffer_mubuf.s (+4-4) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+60) - (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+12) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950.txt (+45) ``diff diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 1e2921160d28f2..f739872685e780 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-buffer-pk-add-bf16-inst,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot13-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" // GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32" // GFX1011: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32" // GFX1012: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx10-insts,+gfx8-insts,+gfx9-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize32" @@ -109,8 +109,8 @@ // GFX1151: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" // GFX1152: "target-features"="+16-bit-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot10-insts,+dot12-insts,+dot5-insts,+dot7-insts,+dot8-insts,+dot9-insts,+dpp,+gfx10-3-insts,+gfx10-insts,+gfx11-insts,+gfx8-insts,+gfx9-insts,+wavefrontsize32" // GFX1153: "target-features"="+16-bit-insts,+at
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)
@@ -1552,7 +1558,9 @@ def FeatureISAVersion9_5_Common : FeatureSet< FeatureBitOp3Insts, FeatureFP8ConversionScaleInsts, FeatureBF8ConversionScaleInsts, - FeatureFP4ConversionScaleInsts + FeatureFP4ConversionScaleInsts, + FeatureFP6BF6ConversionScaleInsts, + FeatureFP8Insts shiltian wrote: why `FeatureFP8Insts` is added here? https://github.com/llvm/llvm-project/pull/117590 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
@@ -408,11 +408,23 @@ def FeatureFP6BF6ConversionScaleInsts : SubtargetFeature<"fp6bf6-cvt-scale-insts "Has fp6 and bf6 conversion scale instructions" >; +def FeatureF16BF16ToFP6BF6ConversionScaleInsts : SubtargetFeature<"f16bf16-to-fp6bf6-cvt-scale-insts", + "HasF16BF16ToFP6BF6ConversionScaleInsts", + "true", + "Has f16bf16 to fp6bf6 conversion scale instructions" +>; + def FeatureGFX950Insts : SubtargetFeature<"gfx950-insts", "GFX950Insts", "true", "Additional instructions for GFX950+", - [FeaturePermlane16Swap, FeaturePermlane32Swap, FeatureFP8ConversionScaleInsts, FeatureBF8ConversionScaleInsts, FeatureFP4ConversionScaleInsts, FeatureFP6BF6ConversionScaleInsts] + [FeaturePermlane16Swap, + FeaturePermlane32Swap, shiltian wrote: the alignment is off here but that can be fixed later https://github.com/llvm/llvm-project/pull/117592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117594 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117593 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117594 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117593 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117596 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117597 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117597 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117598 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117595 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117595 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117599 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117601 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117600 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117600 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117599 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117601 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/erichkeane commented: A pair of minor changes requested, else this looks about right? Not sure who the right person to approve this is though https://github.com/llvm/llvm-project/pull/76260 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
@@ -5740,7 +5740,8 @@ void CodeGenModule::EmitGlobalVarDefinition(const VarDecl *D, if (NeedsGlobalCtor || NeedsGlobalDtor) EmitCXXGlobalVarDeclInitFunc(D, GV, NeedsGlobalCtor); - SanitizerMD->reportGlobal(GV, *D, NeedsGlobalCtor); + SanitizerMD->reportGlobalToASan(GV, *D, NeedsGlobalCtor); erichkeane wrote: This has happened a few times, I would suggest keeping `reportGlobal` and documenting that it does BOTH of these things, and in the few cases you only need ASan, do `reportGlobalToASan`. https://github.com/llvm/llvm-project/pull/76260 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [TySan] A Type Sanitizer (Clang) (PR #76260)
https://github.com/erichkeane edited https://github.com/llvm/llvm-project/pull/76260 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] 1a6525e - Revert "[profile] Use base+vaddr for `__llvm_write_binary_ids` note pointers …"
Author: Petr Hosek Date: 2024-11-25T11:53:16-08:00 New Revision: 1a6525e438abfe54708f14b9ceec27c0e337f336 URL: https://github.com/llvm/llvm-project/commit/1a6525e438abfe54708f14b9ceec27c0e337f336 DIFF: https://github.com/llvm/llvm-project/commit/1a6525e438abfe54708f14b9ceec27c0e337f336.diff LOG: Revert "[profile] Use base+vaddr for `__llvm_write_binary_ids` note pointers …" This reverts commit 667e1fadcf4376ce41f5cae7cabab9f5ccc77b15. Added: Modified: compiler-rt/lib/profile/InstrProfilingPlatformLinux.c Removed: compiler-rt/test/profile/Linux/binary-id-offset.c diff --git a/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c b/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c index 5b230c1b200623..613cfb60857cf3 100644 --- a/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c +++ b/compiler-rt/lib/profile/InstrProfilingPlatformLinux.c @@ -194,33 +194,41 @@ static int WriteBinaryIds(ProfDataWriter *Writer, const ElfW(Nhdr) * Note, */ COMPILER_RT_VISIBILITY int __llvm_write_binary_ids(ProfDataWriter *Writer) { extern const ElfW(Ehdr) __ehdr_start __attribute__((visibility("hidden"))); - extern ElfW(Dyn) _DYNAMIC[] __attribute__((weak, visibility("hidden"))); - const ElfW(Ehdr) *ElfHeader = &__ehdr_start; const ElfW(Phdr) *ProgramHeader = (const ElfW(Phdr) *)((uintptr_t)ElfHeader + ElfHeader->e_phoff); - /* Compute the added base address in case of position-independent code. */ - uintptr_t Base = 0; - for (uint32_t I = 0; I < ElfHeader->e_phnum; I++) { -if (ProgramHeader[I].p_type == PT_PHDR) - Base = (uintptr_t)ProgramHeader - ProgramHeader[I].p_vaddr; -if (ProgramHeader[I].p_type == PT_DYNAMIC && _DYNAMIC) - Base = (uintptr_t)_DYNAMIC - ProgramHeader[I].p_vaddr; - } - int TotalBinaryIdsSize = 0; + uint32_t I; /* Iterate through entries in the program header. */ - for (uint32_t I = 0; I < ElfHeader->e_phnum; I++) { + for (I = 0; I < ElfHeader->e_phnum; I++) { /* Look for the notes segment in program header entries. */ if (ProgramHeader[I].p_type != PT_NOTE) continue; /* There can be multiple notes segment, and examine each of them. */ -const ElfW(Nhdr) *Note = -(const ElfW(Nhdr) *)(Base + ProgramHeader[I].p_vaddr); -const ElfW(Nhdr) *NotesEnd = -(const ElfW(Nhdr) *)((const char *)(Note) + ProgramHeader[I].p_memsz); +const ElfW(Nhdr) * Note; +const ElfW(Nhdr) * NotesEnd; +/* + * When examining notes in file, use p_offset, which is the offset within + * the elf file, to find the start of notes. + */ +if (ProgramHeader[I].p_memsz == 0 || +ProgramHeader[I].p_memsz == ProgramHeader[I].p_filesz) { + Note = (const ElfW(Nhdr) *)((uintptr_t)ElfHeader + + ProgramHeader[I].p_offset); + NotesEnd = (const ElfW(Nhdr) *)((const char *)(Note) + + ProgramHeader[I].p_filesz); +} else { + /* + * When examining notes in memory, use p_vaddr, which is the address of + * section after loaded to memory, to find the start of notes. + */ + Note = + (const ElfW(Nhdr) *)((uintptr_t)ElfHeader + ProgramHeader[I].p_vaddr); + NotesEnd = + (const ElfW(Nhdr) *)((const char *)(Note) + ProgramHeader[I].p_memsz); +} int BinaryIdsSize = WriteBinaryIds(Writer, Note, NotesEnd); if (TotalBinaryIdsSize == -1) diff --git a/compiler-rt/test/profile/Linux/binary-id-offset.c b/compiler-rt/test/profile/Linux/binary-id-offset.c deleted file mode 100644 index c66fe82d714ce9..00 --- a/compiler-rt/test/profile/Linux/binary-id-offset.c +++ /dev/null @@ -1,33 +0,0 @@ -// REQUIRES: linux -// -// Make sure the build-id can be found in both EXEC and DYN (PIE) files, -// even when the note's section-start is forced to a weird address. -// (The DYN case would also apply to libraries, not explicitly tested here.) - -// DEFINE: %{cflags} = -// DEFINE: %{check} = ( \ -// DEFINE: %clang_profgen -Wl,--build-id -o %t %s %{cflags} && \ -// DEFINE: env LLVM_PROFILE_FILE=%t.profraw %run %t && \ -// DEFINE: llvm-readelf --notes %t && \ -// DEFINE: llvm-profdata show --binary-ids %t.profraw \ -// DEFINE: ) | FileCheck %s - -// REDEFINE: %{cflags} = -no-pie -// RUN: %{check} - -// REDEFINE: %{cflags} = -pie -fPIE -// RUN: %{check} - -// REDEFINE: %{cflags} = -no-pie -Wl,--section-start=.note.gnu.build-id=0x100 -// RUN: %{check} - -// REDEFINE: %{cflags} = -pie -fPIE -Wl,--section-start=.note.gnu.build-id=0x100 -// RUN: %{check} - -// CHECK-LABEL{LITERAL}: .note.gnu.build-id -// CHECK: Build ID: [[ID:[0-9a-f]+]] - -// CHECK-LABEL{LITERAL}: Binary IDs: -// CHECK-NEXT: [[ID]] - -int main() { return 0; }
[llvm-branch-commits] [llvm] release/19.x: [InstCombine] Drop noundef attributes in `foldCttzCtlz` (#116718) (PR #116865)
https://github.com/nikic milestoned https://github.com/llvm/llvm-project/pull/116865 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)
https://github.com/shiltian edited https://github.com/llvm/llvm-project/pull/117286 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle vcmpx+permalane gfx950 hazard (PR #117286)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117286 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Handle gfx950 valu write vdst + permlane read hazard (PR #117287)
@@ -2551,8 +2551,34 @@ int GCNHazardRecognizer::checkPermlaneHazards(MachineInstr *MI) { return isVCmpXWritesExec(*TII, *TRI, MI); }; - const int NumWaitStates = 4; - return NumWaitStates - getWaitStatesSince(IsVCmpXWritesExecFn, NumWaitStates); + auto IsVALUFn = [](const MachineInstr &MI) { +return SIInstrInfo::isVALU(MI); + }; + + const int VCmpXWritesExecWaitStates = 4; + const int VALUWritesVDstWaitStates = 2; + int WaitStatesNeeded = 0; + + for (const MachineOperand &Op : MI->explicit_uses()) { +if (!Op.isReg() || !TRI->isVGPR(MF.getRegInfo(), Op.getReg())) + continue; +Register Reg = Op.getReg(); + +int WaitStatesSinceDef = +VALUWritesVDstWaitStates - +getWaitStatesSinceDef(Reg, IsVALUFn, + /*MaxWaitStates=*/VALUWritesVDstWaitStates); shiltian wrote: `/*MaxWaitStates=*/` is not needed here, as `VALUWritesVDstWaitStates` is a variable. https://github.com/llvm/llvm-project/pull/117287 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] Add documentation for Multilib custom flags (PR #114998)
https://github.com/vhscampos updated https://github.com/llvm/llvm-project/pull/114998 >From be0d5d6ee15e22b78a6fe671dc4f665680fd2aa5 Mon Sep 17 00:00:00 2001 From: Victor Campos Date: Tue, 5 Nov 2024 14:22:06 + Subject: [PATCH 1/2] Add documentation for Multilib custom flags --- clang/docs/Multilib.rst | 90 + 1 file changed, 90 insertions(+) diff --git a/clang/docs/Multilib.rst b/clang/docs/Multilib.rst index 7637d0db9565b8..85cb789b9847ac 100644 --- a/clang/docs/Multilib.rst +++ b/clang/docs/Multilib.rst @@ -122,6 +122,78 @@ subclass and a suitable base multilib variant is present then the It is the responsibility of layered multilib authors to ensure that headers and libraries in each layer are complete enough to mask any incompatibilities. +Multilib custom flags += + +Introduction + + +The multilib mechanism supports library variants that correspond to target, +code generation or language command-line flags. Examples include ``--target``, +``-mcpu``, ``-mfpu``, ``-mbranch-protection``, ``-fno-rtti``. However, some library +variants are particular to features that do not correspond to any command-line +option. Multithreading and semihosting, for instance, have no associated +compiler option. + +In order to support the selection of variants for which no compiler option +exists, the multilib specification includes the concept of *custom flags*. +These flags have no impact on code generation and are only used in the multilib +processing. + +Multilib custom flags follow this format in the driver invocation: + +:: + + -fmultilib-flag= + +They are fed into the multilib system alongside the remaining flags. + +Custom flag declarations + + +Custom flags can be declared in the YAML file under the *Flags* section. + +.. code-block:: yaml + + Flags: + - Name: multithreaded +Values: +- Name: no-multithreaded + DriverArgs: [-D__SINGLE_THREAD__] +- Name: multithreaded +Default: no-multithreaded + +* Name: the name to categorize a flag. +* Values: a list of flag *Value*s (defined below). +* Default: it specifies the name of the value this flag should take if not + specified in the command-line invocation. It must be one value from the Values + field. + +A Default value is useful to save users from specifying custom flags that have a +most commonly used value. + +Each flag *Value* is defined as: + +* Name: name of the value. This is the string to be used in + ``-fmultilib-flag=``. +* DriverArgs: a list of strings corresponding to the extra driver arguments + used to build a library variant that's in accordance to this specific custom + flag value. These arguments are fed back into the driver if this flag *Value* + is enabled. + +The namespace of flag values is common across all flags. This means that flag +value names must be unique. + +Usage of custom flags in the *Variants* specifications +-- + +Library variants should list their requirement on one or more custom flags like +they do for any other flag. Each requirement must be listed as +``-fmultilib-flag=``. + +A variant that does not specify a requirement on one particular flag can be +matched against any value of that flag. + Stability = @@ -222,6 +294,24 @@ For a more comprehensive example see # Flags is a list of one or more strings. Flags: [--target=thumbv7m-none-eabi] + # Custom flag declarations. Each item is a different declaration. + Flags: +# Name of the flag + - Name: multithreaded +# List of custom flag values +Values: + # Name of the custom flag value. To be used in -fmultilib-flag=. +- Name: no-multithreaded + # Extra driver arguments to be printed with -print-multi-lib. Useful for + # specifying extra arguments for building the the associated library + # variant(s). + DriverArgs: [-D__SINGLE_THREAD__] +- Name: multithreaded +# Default flag value. If no value for this flag declaration is used in the +# command-line, the multilib system will use this one. Must be equal to one +# of the flag value names from this flag declaration. +Default: no-multithreaded + Design principles = >From a940ccd9eec0f683df9f41f2a9e218df76357364 Mon Sep 17 00:00:00 2001 From: Victor Campos Date: Mon, 25 Nov 2024 15:07:57 + Subject: [PATCH 2/2] Fix doc build warning --- clang/docs/Multilib.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clang/docs/Multilib.rst b/clang/docs/Multilib.rst index 85cb789b9847ac..48d84087dda01c 100644 --- a/clang/docs/Multilib.rst +++ b/clang/docs/Multilib.rst @@ -164,7 +164,7 @@ Custom flags can be declared in the YAML file under the *Flags* section. Default: no-multithreaded * Name: the name to categorize a flag. -* Values: a list of flag *Value*s (defined below). +* Values: a list of flag Values (defined b
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_f32_[fp|bf]8 of gfx950. (PR #117383)
https://github.com/shiltian approved this pull request. https://github.com/llvm/llvm-project/pull/117383 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117592 Co-authored-by: Pravin Jagtap >From 3ba5c37284ce7df02470662c790cc5280e0a62a2 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Mon, 8 Apr 2024 04:56:56 -0400 Subject: [PATCH] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 Co-authored-by: Pravin Jagtap --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 4 + clang/test/CodeGenOpenCL/amdgpu-features.cl | 2 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 43 ++ llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 9 + llvm/lib/Target/AMDGPU/AMDGPU.td | 17 +- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 4 + llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h | 3 + .../Disassembler/AMDGPUDisassembler.cpp | 1 + llvm/lib/Target/AMDGPU/SIInstrInfo.td | 7 +- llvm/lib/Target/AMDGPU/SIRegisterInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3Instructions.td| 14 + llvm/lib/TargetParser/TargetParser.cpp| 1 + .../AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll | 474 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 16 + llvm/test/MC/AMDGPU/gfx950_err.s | 48 ++ .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 12 + 16 files changed, 653 insertions(+), 3 deletions(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index a42ad56ce4f998..e09dc0e1107a82 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -559,6 +559,10 @@ TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_fp8_w64, "V4fiV2iV4fs", TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_bf8_w64, "V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64") TARGET_BUILTIN(__builtin_amdgcn_prng_b32, "UiUi", "nc", "prng-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_f16, "V6UiV32hf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_f16, "V6UiV32hf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_bf16, "V6UiV32yf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_bf16, "V6UiV32yf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index f9e07fbc6b0480..56013dad9b6651 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117593 OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap >From b4657178189eac34b30147a2e9343616ee5ea8b7 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Mon, 8 Apr 2024 07:44:32 -0400 Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap --- llvm/lib/Target/AMDGPU/VOP3Instructions.td| 8 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 96 +++ llvm/test/MC/AMDGPU/gfx950_err.s | 50 +- .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 72 ++ 4 files changed, 225 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index 764a2275205665..fdffb2c36dcccf 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -945,6 +945,8 @@ let SubtargetPredicate = HasFP8ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3Inst<"v_cvt_scalef32_pk_f32_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_f16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; defm V_CVT_SCALEF32_PK_FP8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_bf16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; + defm V_CVT_SCALEF32_PK_F16_FP8: VOP3Inst<"v_cvt_scalef32_pk_f16_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; } let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -954,6 +956,8 @@ let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3Inst<"v_cvt_scalef32_pk_f32_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_f16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; defm V_CVT_SCALEF32_PK_BF8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_bf16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; + defm V_CVT_SCALEF32_PK_F16_BF8: VOP3Inst<"v_cvt_scalef32_pk_f16_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3Inst<"v_cvt_scalef32_pk_bf16_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; } let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -1908,6 +1912,8 @@ defm V_CVT_SCALEF32_PK_FP8_F32 : VOP3OpSel_Real_gfx9 <0x235>; defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3OpSel_Real_gfx9 <0x239>; defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3OpSel_Real_gfx9 <0x240>; defm V_CVT_SCALEF32_PK_FP8_BF16: VOP3OpSel_Real_gfx9 <0x244>; +defm V_CVT_SCALEF32_PK_F16_FP8 : VOP3OpSel_Real_gfx9<0x248>; +defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3OpSel_Real_gfx9<0x269>; } let OtherPredicates = [HasBF8ConversionScaleInsts] in { defm V_CVT_SCALEF32_F16_BF8 : VOP3OpSel_Real_gfx9 <0x24b>; @@ -1916,6 +1922,8 @@ defm V_CVT_SCALEF32_PK_BF8_F32 : VOP3OpSel_Real_gfx9 <0x236>; defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3OpSel_Real_gfx9 <0x23a>; defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3OpSel_Real_gfx9 <0x241>; defm V_CVT_SCALEF32_PK_BF8_BF16: VOP3OpSel_Real_gfx9 <0x245>; +defm V_CVT_SCALEF32_PK_F16_BF8 : VOP3OpSel_Real_gfx9<0x249>; +defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3OpSel_Real_gfx9<0x26a>; } let OtherPredicates = [HasFP4ConversionScaleInsts] in { defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>; diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s b/llvm/test/MC/AMDGPU/gfx950_asm_features.s index 1aef267537aa55..e505b6ff4ad58b 100644 --- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s @@ -929,3 +929,99 @@ v_cvt_scalef32_pk32_fp6_bf16 v[20:25], v[10:25], v8 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: // GFX950: v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 ; encoding: [0x14,0x00,0x58,0xd2,0x0a,0x11,0x02,0x00] v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x00,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 ; encoding: [0x01,0x00,0x48,0xd2,0x02,0x06,0x01,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] ; encoding: [0x01,0x08,0x48,0xd2,0x02,0x07,0x02,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] + +//
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117262** https
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117262** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117597 v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande >From f221f63e40154aaf7f97acc3e48a8b7ba5659f8d Mon Sep 17 00:00:00 2001 From: Sirish Pande Date: Fri, 10 May 2024 17:33:59 -0500 Subject: [PATCH] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 2 +- clang/test/CodeGenOpenCL/amdgpu-features.cl | 24 +++ .../builtins-amdgcn-dl-insts-err.cl | 4 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 25 +++ llvm/lib/Target/AMDGPU/AMDGPU.td | 14 +++- llvm/lib/Target/AMDGPU/GCNSubtarget.h | 5 ++ llvm/lib/Target/AMDGPU/VOP3PInstructions.td | 5 +- llvm/lib/TargetParser/TargetParser.cpp| 3 + .../AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll | 68 +++ llvm/test/MC/AMDGPU/gfx950_dlops.s| 61 + .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 60 11 files changed, 253 insertions(+), 18 deletions(-) create mode 100644 llvm/test/MC/AMDGPU/gfx950_dlops.s diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index fd449697e91216..7d0019eead96b6 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -263,7 +263,7 @@ TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", "gfx940 TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot10-insts") TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts") TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", "dot9-insts") -TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot9-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot12-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts") diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index db7fd76ec91189..0b698035ee54c7 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx
[llvm-branch-commits] [compiler-rt] [libcxx] [libcxxabi] [llvm] Reapply "[runtimes] Allow building against an installed LLVM tree" (PR #114307)
https://github.com/arichardson updated https://github.com/llvm/llvm-project/pull/114307 >From 6a6483cfe53ad33d3a5cd4432c33a5af93694668 Mon Sep 17 00:00:00 2001 From: Alexander Richardson Date: Wed, 30 Oct 2024 14:33:11 -0700 Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?= =?UTF-8?q?itial=20version?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.6-beta.1 --- compiler-rt/cmake/Modules/AddCompilerRT.cmake | 1 + compiler-rt/test/hwasan/lit.cfg.py| 9 + compiler-rt/test/lit.common.configured.in | 1 + libcxx/CMakeLists.txt | 12 +++--- libcxxabi/CMakeLists.txt | 6 +-- runtimes/CMakeLists.txt | 40 +-- 6 files changed, 53 insertions(+), 16 deletions(-) diff --git a/compiler-rt/cmake/Modules/AddCompilerRT.cmake b/compiler-rt/cmake/Modules/AddCompilerRT.cmake index e3d81d241b1054..b2f33d1a961c74 100644 --- a/compiler-rt/cmake/Modules/AddCompilerRT.cmake +++ b/compiler-rt/cmake/Modules/AddCompilerRT.cmake @@ -773,6 +773,7 @@ function(configure_compiler_rt_lit_site_cfg input output) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_TEST_COMPILER ${COMPILER_RT_TEST_COMPILER}) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_OUTPUT_DIR ${COMPILER_RT_OUTPUT_DIR}) + string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR ${COMPILER_RT_EXEC_OUTPUT_DIR}) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR ${output_dir}) configure_lit_site_cfg(${input} ${output}) diff --git a/compiler-rt/test/hwasan/lit.cfg.py b/compiler-rt/test/hwasan/lit.cfg.py index 594f3294a84ac1..bbf23e683240ac 100644 --- a/compiler-rt/test/hwasan/lit.cfg.py +++ b/compiler-rt/test/hwasan/lit.cfg.py @@ -2,6 +2,9 @@ import os +from lit.llvm import llvm_config +from lit.llvm.subst import ToolSubst, FindTool + # Setup config name. config.name = "HWAddressSanitizer" + getattr(config, "name_suffix", "default") @@ -74,6 +77,12 @@ def build_invocation(compile_flags): ("%env_hwasan_opts=", "env HWASAN_OPTIONS=" + default_hwasan_opts_str) ) +# Ensure that we can use hwasan_symbolize from the expected location +llvm_config.add_tool_substitutions( +[ToolSubst("hwasan_symbolize", unresolved="fatal")], +search_dirs=[config.compiler_rt_bindir], +) + # Default test suffixes. config.suffixes = [".c", ".cpp"] diff --git a/compiler-rt/test/lit.common.configured.in b/compiler-rt/test/lit.common.configured.in index 66935c358afedd..050792b6b26217 100644 --- a/compiler-rt/test/lit.common.configured.in +++ b/compiler-rt/test/lit.common.configured.in @@ -28,6 +28,7 @@ set_default("python_executable", "@Python3_EXECUTABLE@") set_default("compiler_rt_debug", @COMPILER_RT_DEBUG_PYBOOL@) set_default("compiler_rt_intercept_libdispatch", @COMPILER_RT_INTERCEPT_LIBDISPATCH_PYBOOL@) set_default("compiler_rt_output_dir", "@COMPILER_RT_RESOLVED_OUTPUT_DIR@") +set_default("compiler_rt_bindir", "@COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR@") set_default("compiler_rt_libdir", "@COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR@") set_default("emulator", "@COMPILER_RT_EMULATOR@") set_default("asan_shadow_scale", "@COMPILER_RT_ASAN_SHADOW_SCALE@") diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt index 95a7d10f055ea7..7b3f032fd82126 100644 --- a/libcxx/CMakeLists.txt +++ b/libcxx/CMakeLists.txt @@ -413,9 +413,9 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE) string(APPEND LIBCXX_TARGET_SUBDIR /${LIBCXX_LIBDIR_SUBDIR}) endif() set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}/${LIBCXX_TARGET_SUBDIR}) - set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1") - set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1") - set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR "${LLVM_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1") + set(LIBCXX_GENERATED_INCLUDE_DIR "${LIBCXX_BINARY_DIR}/include/c++/v1") + set(LIBCXX_GENERATED_MODULE_DIR "${LIBCXX_BINARY_DIR}/modules/c++/v1") + set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR "${LIBCXX_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1") set(LIBCXX_INSTALL_LIBRARY_DIR lib${LLVM_LIBDIR_SUFFIX}/${LIBCXX_TARGET_SUBDIR} CACHE STRING "Path where built libc++ libraries should be installed.") set(LIBCXX_INSTALL_INCLUDE_TARGET_DIR "${CMAKE_INSTALL_INCLUDEDIR}/${LIBCXX_TARGET_SUBDIR}/c++/v1" CACHE STRING @@ -424,13 +424,11 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE) else() if(LLVM_LIBRARY_OUTPUT_INTDIR) set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}) -set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1") -set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1") else() set(LIBCXX_LIBRARY_DIR ${CMAKE_BINARY_DIR}/lib${LIBCXX_LIBDIR_SUFFIX}) -set(LIB
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117591 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [libcxx] [libcxxabi] [llvm] Reapply "[runtimes] Allow building against an installed LLVM tree" (PR #114307)
https://github.com/arichardson updated https://github.com/llvm/llvm-project/pull/114307 >From 6a6483cfe53ad33d3a5cd4432c33a5af93694668 Mon Sep 17 00:00:00 2001 From: Alexander Richardson Date: Wed, 30 Oct 2024 14:33:11 -0700 Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20in?= =?UTF-8?q?itial=20version?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Created using spr 1.3.6-beta.1 --- compiler-rt/cmake/Modules/AddCompilerRT.cmake | 1 + compiler-rt/test/hwasan/lit.cfg.py| 9 + compiler-rt/test/lit.common.configured.in | 1 + libcxx/CMakeLists.txt | 12 +++--- libcxxabi/CMakeLists.txt | 6 +-- runtimes/CMakeLists.txt | 40 +-- 6 files changed, 53 insertions(+), 16 deletions(-) diff --git a/compiler-rt/cmake/Modules/AddCompilerRT.cmake b/compiler-rt/cmake/Modules/AddCompilerRT.cmake index e3d81d241b1054..b2f33d1a961c74 100644 --- a/compiler-rt/cmake/Modules/AddCompilerRT.cmake +++ b/compiler-rt/cmake/Modules/AddCompilerRT.cmake @@ -773,6 +773,7 @@ function(configure_compiler_rt_lit_site_cfg input output) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_TEST_COMPILER ${COMPILER_RT_TEST_COMPILER}) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_OUTPUT_DIR ${COMPILER_RT_OUTPUT_DIR}) + string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR ${COMPILER_RT_EXEC_OUTPUT_DIR}) string(REPLACE ${CMAKE_CFG_INTDIR} ${LLVM_BUILD_MODE} COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR ${output_dir}) configure_lit_site_cfg(${input} ${output}) diff --git a/compiler-rt/test/hwasan/lit.cfg.py b/compiler-rt/test/hwasan/lit.cfg.py index 594f3294a84ac1..bbf23e683240ac 100644 --- a/compiler-rt/test/hwasan/lit.cfg.py +++ b/compiler-rt/test/hwasan/lit.cfg.py @@ -2,6 +2,9 @@ import os +from lit.llvm import llvm_config +from lit.llvm.subst import ToolSubst, FindTool + # Setup config name. config.name = "HWAddressSanitizer" + getattr(config, "name_suffix", "default") @@ -74,6 +77,12 @@ def build_invocation(compile_flags): ("%env_hwasan_opts=", "env HWASAN_OPTIONS=" + default_hwasan_opts_str) ) +# Ensure that we can use hwasan_symbolize from the expected location +llvm_config.add_tool_substitutions( +[ToolSubst("hwasan_symbolize", unresolved="fatal")], +search_dirs=[config.compiler_rt_bindir], +) + # Default test suffixes. config.suffixes = [".c", ".cpp"] diff --git a/compiler-rt/test/lit.common.configured.in b/compiler-rt/test/lit.common.configured.in index 66935c358afedd..050792b6b26217 100644 --- a/compiler-rt/test/lit.common.configured.in +++ b/compiler-rt/test/lit.common.configured.in @@ -28,6 +28,7 @@ set_default("python_executable", "@Python3_EXECUTABLE@") set_default("compiler_rt_debug", @COMPILER_RT_DEBUG_PYBOOL@) set_default("compiler_rt_intercept_libdispatch", @COMPILER_RT_INTERCEPT_LIBDISPATCH_PYBOOL@) set_default("compiler_rt_output_dir", "@COMPILER_RT_RESOLVED_OUTPUT_DIR@") +set_default("compiler_rt_bindir", "@COMPILER_RT_RESOLVED_EXEC_OUTPUT_DIR@") set_default("compiler_rt_libdir", "@COMPILER_RT_RESOLVED_LIBRARY_OUTPUT_DIR@") set_default("emulator", "@COMPILER_RT_EMULATOR@") set_default("asan_shadow_scale", "@COMPILER_RT_ASAN_SHADOW_SCALE@") diff --git a/libcxx/CMakeLists.txt b/libcxx/CMakeLists.txt index 95a7d10f055ea7..7b3f032fd82126 100644 --- a/libcxx/CMakeLists.txt +++ b/libcxx/CMakeLists.txt @@ -413,9 +413,9 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE) string(APPEND LIBCXX_TARGET_SUBDIR /${LIBCXX_LIBDIR_SUBDIR}) endif() set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}/${LIBCXX_TARGET_SUBDIR}) - set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1") - set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1") - set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR "${LLVM_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1") + set(LIBCXX_GENERATED_INCLUDE_DIR "${LIBCXX_BINARY_DIR}/include/c++/v1") + set(LIBCXX_GENERATED_MODULE_DIR "${LIBCXX_BINARY_DIR}/modules/c++/v1") + set(LIBCXX_GENERATED_INCLUDE_TARGET_DIR "${LIBCXX_BINARY_DIR}/include/${LIBCXX_TARGET_SUBDIR}/c++/v1") set(LIBCXX_INSTALL_LIBRARY_DIR lib${LLVM_LIBDIR_SUFFIX}/${LIBCXX_TARGET_SUBDIR} CACHE STRING "Path where built libc++ libraries should be installed.") set(LIBCXX_INSTALL_INCLUDE_TARGET_DIR "${CMAKE_INSTALL_INCLUDEDIR}/${LIBCXX_TARGET_SUBDIR}/c++/v1" CACHE STRING @@ -424,13 +424,11 @@ if(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR AND NOT APPLE) else() if(LLVM_LIBRARY_OUTPUT_INTDIR) set(LIBCXX_LIBRARY_DIR ${LLVM_LIBRARY_OUTPUT_INTDIR}) -set(LIBCXX_GENERATED_INCLUDE_DIR "${LLVM_BINARY_DIR}/include/c++/v1") -set(LIBCXX_GENERATED_MODULE_DIR "${LLVM_BINARY_DIR}/modules/c++/v1") else() set(LIBCXX_LIBRARY_DIR ${CMAKE_BINARY_DIR}/lib${LIBCXX_LIBDIR_SUFFIX}) -set(LIB
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117262** https
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117262** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https://app.graphite.dev/github/pr/llvm/llvm-project/117287?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117286** https://app.graphite.dev/github/pr/llvm/llvm-project/117286?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117285** https://app.graphite.dev/github/pr/llvm/llvm-project/117285?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117284** https://app.graphite.dev/github/pr/llvm/llvm-project/117284?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117283** https://app.graphite.dev/github/pr/llvm/llvm-project/117283?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117263** https://app.graphite.dev/github/pr/llvm/llvm-project/117263?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117262** https
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117590 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes v_dot2_f32_bf16 was added in gfx11 along with v_dot2_f16_f16 and v_dot2_bf16_bf16. All three instructions were part of Dot9 instructions in the compiler. This patch will split existing dot9 (v_dot2_f16_f16, v_dot2_bf16_bf16, v_dot2_f32_bf16) into new dot9 (v_dot2_f16_f16 and v_dot2_bf16_bf16), and dot12 (v_dot2_f32_bf16). All necessary changes to gfx11 and gfx12 are updated to reflect this change. Co-authored-by: Sirish Pande--- Patch is 30.80 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117597.diff 11 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+1-1) - (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+12-12) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-dl-insts-err.cl (+2-2) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+25) - (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+13-1) - (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+5) - (modified) llvm/lib/Target/AMDGPU/VOP3PInstructions.td (+3-2) - (modified) llvm/lib/TargetParser/TargetParser.cpp (+3) - (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f32.bf16.ll (+68) - (added) llvm/test/MC/AMDGPU/gfx950_dlops.s (+61) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+60) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index fd449697e91216..7d0019eead96b6 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -263,7 +263,7 @@ TARGET_BUILTIN(__builtin_amdgcn_global_load_lds, "vv*1v*3IUiIiIUi", "t", "gfx940 TARGET_BUILTIN(__builtin_amdgcn_fdot2, "fV2hV2hfIb", "nc", "dot10-insts") TARGET_BUILTIN(__builtin_amdgcn_fdot2_f16_f16, "hV2hV2hh", "nc", "dot9-insts") TARGET_BUILTIN(__builtin_amdgcn_fdot2_bf16_bf16, "sV2sV2ss", "nc", "dot9-insts") -TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot9-insts") +TARGET_BUILTIN(__builtin_amdgcn_fdot2_f32_bf16, "fV2sV2sfIb", "nc", "dot12-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot2, "SiV2SsV2SsSiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_udot2, "UiV2UsV2UsUiIb", "nc", "dot2-insts") TARGET_BUILTIN(__builtin_amdgcn_sdot4, "SiSiSiSiIb", "nc", "dot1-insts") diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index db7fd76ec91189..0b698035ee54c7 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot12-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande--- Patch is 22.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117596.diff 13 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+3) - (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+1-1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+46) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+10) - (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+16-6) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2) - (modified) llvm/lib/Target/AMDGPU/GCNSubtarget.h (+3) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) - (modified) llvm/lib/Target/AMDGPU/VOPInstructions.td (+1) - (modified) llvm/lib/TargetParser/TargetParser.cpp (+1) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_vop3.s (+72) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+36) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index dacbf5aa902f60..fd449697e91216 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, "V3iV3i*3", "nc", "gfx950 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_i8_i32, "UsUiUiUi", "nc", "ashr-pk-insts") +TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_u8_i32, "UsUiUiUi", "nc", "ashr-pk-insts") + TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 56013dad9b6651..db7fd76ec91189 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" // GFX1010: "target-features"="+16-bit-insts,+ci-insts,+dl-insts,+dpp,+gfx10-insts,+g
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_f32_[fp|bf]6 of gfx950 (PR #117590)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 8f7e780a4014c19daa5e980d943a381a48e6152f 5801905fe13b783780dc09cb3ac4c177c92b10d5 --extensions h,cpp -- llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.h `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h index 1a09f55dfd..ea77cfe720 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h @@ -185,7 +185,9 @@ public: bool hasFP4ConversionScaleInsts() const { return HasFP4ConversionScaleInsts; } - bool hasFP6BF6ConversionScaleInsts() const { return HasFP6BF6ConversionScaleInsts; } + bool hasFP6BF6ConversionScaleInsts() const { +return HasFP6BF6ConversionScaleInsts; + } bool hasMadMacF32Insts() const { return HasMadMacF32Insts || !isGCN(); diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp index fa5f86b078..cb2c71bb0a 100644 --- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp +++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp @@ -1530,7 +1530,8 @@ unsigned AMDGPUDisassembler::getVgprClassId(const OpWidthTy Width) const { case OPWV232: return VReg_64RegClassID; case OPW96: return VReg_96RegClassID; case OPW128: return VReg_128RegClassID; - case OPW192: return VReg_192RegClassID; + case OPW192: +return VReg_192RegClassID; case OPW160: return VReg_160RegClassID; case OPW256: return VReg_256RegClassID; case OPW288: return VReg_288RegClassID; `` https://github.com/llvm/llvm-project/pull/117590 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 145c4c8611307f4039f390a1a69fad4fe4c14ee3 3ba5c37284ce7df02470662c790cc5280e0a62a2 --extensions h,cpp -- llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp llvm/lib/TargetParser/TargetParser.cpp `` View the diff from clang-format here. ``diff diff --git a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h index 742f4e6e80..79e8bb9146 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h +++ b/llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h @@ -188,7 +188,9 @@ public: bool hasFP6BF6ConversionScaleInsts() const { return HasFP6BF6ConversionScaleInsts; } - bool hasF16BF16ToFP6BF6ConversionScaleInsts() const { return HasF16BF16ToFP6BF6ConversionScaleInsts; } + bool hasF16BF16ToFP6BF6ConversionScaleInsts() const { +return HasF16BF16ToFP6BF6ConversionScaleInsts; + } bool hasMadMacF32Insts() const { return HasMadMacF32Insts || !isGCN(); `` https://github.com/llvm/llvm-project/pull/117592 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2_f32_bf16 instruction for gfx950 (PR #117597)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117597 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117598 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add minimum3/maximum3 pkf16 for gfx950 encodings (PR #117601)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117601** https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117601?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support buffer_atomic_pk_add_bf16 for gfx950 (PR #117599)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117599 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_dot2c_f32_bf16 instruction for gfx950 (PR #117598)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#117600** https://app.graphite.dev/github/pr/llvm/llvm-project/117600?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117599** https://app.graphite.dev/github/pr/llvm/llvm-project/117599?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117598** https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/117598?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#117597** https://app.graphite.dev/github/pr/llvm/llvm-project/117597?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117596** https://app.graphite.dev/github/pr/llvm/llvm-project/117596?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117595** https://app.graphite.dev/github/pr/llvm/llvm-project/117595?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117594** https://app.graphite.dev/github/pr/llvm/llvm-project/117594?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117593** https://app.graphite.dev/github/pr/llvm/llvm-project/117593?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117592** https://app.graphite.dev/github/pr/llvm/llvm-project/117592?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117591** https://app.graphite.dev/github/pr/llvm/llvm-project/117591?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117590** https://app.graphite.dev/github/pr/llvm/llvm-project/117590?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117418** https://app.graphite.dev/github/pr/llvm/llvm-project/117418?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117417** https://app.graphite.dev/github/pr/llvm/llvm-project/117417?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117384** https://app.graphite.dev/github/pr/llvm/llvm-project/117384?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117383** https://app.graphite.dev/github/pr/llvm/llvm-project/117383?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117382** https://app.graphite.dev/github/pr/llvm/llvm-project/117382?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117381** https://app.graphite.dev/github/pr/llvm/llvm-project/117381?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117380** https://app.graphite.dev/github/pr/llvm/llvm-project/117380?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117379** https://app.graphite.dev/github/pr/llvm/llvm-project/117379?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117378** https://app.graphite.dev/github/pr/llvm/llvm-project/117378?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#117287** https
[llvm-branch-commits] [llvm] AMDGPU: Add encodings for minimum3/maximum3 f32 for gfx950 (PR #117600)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117600 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 (PR #117596)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117596 This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande >From 75056a46ee4d7eb6543c2ce99a157a1627a54158 Mon Sep 17 00:00:00 2001 From: Sirish Pande Date: Tue, 13 Feb 2024 10:54:51 -0600 Subject: [PATCH] AMDGPU: Add support for v_ashr_pk_i8/u8_i32 instructions for gfx950 This patch adds assembly and builtin support for v_ashr_pk_i8/u8_i32 instructions. Co-authored-by: Sirish Pande --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 3 + clang/test/CodeGenOpenCL/amdgpu-features.cl | 2 +- .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 46 llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 10 +++ llvm/lib/Target/AMDGPU/AMDGPU.td | 22 -- .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 + llvm/lib/Target/AMDGPU/GCNSubtarget.h | 3 + llvm/lib/Target/AMDGPU/SIInstrInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3Instructions.td| 8 +++ llvm/lib/Target/AMDGPU/VOPInstructions.td | 1 + llvm/lib/TargetParser/TargetParser.cpp| 1 + llvm/test/MC/AMDGPU/gfx950_asm_vop3.s | 72 +++ .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 36 ++ 13 files changed, 200 insertions(+), 7 deletions(-) diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index dacbf5aa902f60..fd449697e91216 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, "V3iV3i*3", "nc", "gfx950 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_i8_i32, "UsUiUiUi", "nc", "ashr-pk-insts") +TARGET_BUILTIN(__builtin_amdgcn_ashr_pk_u8_i32, "UsUiUiUi", "nc", "ashr-pk-insts") + TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index 56013dad9b6651..db7fd76ec91189 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+ashr-pk-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memre
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117595 Scale packed 16-component single-precision float vectors from two source inputs using the exponent provided by the third single-precision float input, then convert the values to a packed 32-component FP6 float value. Co-authored-by: Pravin Jagtap >From a559035a27de3a7cde8e07f6438814b1cce79a14 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Mon, 8 Apr 2024 08:56:14 -0400 Subject: [PATCH] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. Scale packed 16-component single-precision float vectors from two source inputs using the exponent provided by the third single-precision float input, then convert the values to a packed 32-component FP6 float value. Co-authored-by: Pravin Jagtap --- clang/include/clang/Basic/BuiltinsAMDGPU.def | 3 + .../CodeGenOpenCL/builtins-amdgcn-gfx950.cl | 22 ++- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 6 + .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 + llvm/lib/Target/AMDGPU/SIInstrInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3Instructions.td| 8 ++ .../llvm.amdgcn.cvt.scalef32.pk.gfx950.ll | 128 ++ llvm/test/MC/AMDGPU/gfx950_asm_features.s | 24 llvm/test/MC/AMDGPU/gfx950_err.s | 24 .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 18 +++ 10 files changed, 235 insertions(+), 1 deletion(-) create mode 100644 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e09dc0e1107a82..dacbf5aa902f60 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, "V3iV3i*3", "nc", "gfx950 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") + //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl index 779aadd96f3f41..6f3c81b26be0b8 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl @@ -7,6 +7,8 @@ typedef unsigned int __attribute__((ext_vector_type(2))) uint2; typedef unsigned int __attribute__((ext_vector_type(6))) uint6; typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32; typedef half __attribute__((ext_vector_type(32))) half32; +typedef short __attribute__((ext_vector_type(2))) short2; +typedef float __attribute__((ext_vector_type(16))) float16; // CHECK-LABEL: @test_prng_b32( // CHECK-NEXT: entry: @@ -115,10 +117,14 @@ void test_permlane32_swap(global uint2* out, uint old, uint src) { // CHECK-NEXT:[[OUT6_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) // CHECK-NEXT:[[SRCBF32_ADDR:%.*]] = alloca <32 x bfloat>, align 64, addrspace(5) // CHECK-NEXT:[[SRCH32_ADDR:%.*]] = alloca <32 x half>, align 64, addrspace(5) +// CHECK-NEXT:[[SRC0F32_ADDR:%.*]] = alloca <16 x float>, align 64, addrspace(5) +// CHECK-NEXT:[[SRC1F32_ADDR:%.*]] = alloca <16 x float>, align 64, addrspace(5) // CHECK-NEXT:[[SCALE_ADDR:%.*]] = alloca float, align 4, addrspace(5) // CHECK-NEXT:store ptr addrspace(1) [[OUT6:%.*]], ptr addrspace(5) [[OUT6_ADDR]], align 8 // CHECK-NEXT:store <32 x bfloat> [[SRCBF32:%.*]], ptr addrspace(5) [[SRCBF32_ADDR]], align 64 // CHECK-NEXT:store <32 x half> [[SRCH32:%.*]], ptr addrspace(5) [[SRCH32_ADDR]], align 64 +// CHECK-NEXT:store <16 x float> [[SRC0F32:%.*]], ptr addrspace(5) [[SRC0F32_ADDR]], align 64 +// CHECK-NEXT:store <16 x float> [[SRC1F32:%.*]], ptr addrspace(5) [[SRC1F32_ADDR]], align 64 // CHECK-NEXT:store float [[SCALE:%.*]], ptr addrspace(5) [[SCALE_ADDR]], align 4 // CHECK-NEXT:[[TMP0:%.*]] = load <32 x bfloat>, ptr addrspace(5) [[SRCBF32_ADDR]], align 64 // CHECK-NEXT:[[TMP1:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], align 4 @@ -140,12 +146,26 @@ void test_permlane32_swap(global uint2* out, uint old, uint src) { // CHECK-NEXT:[[TMP14:%.*]] = call <6 x i32> @llvm.amdgcn.cvt.scalef32.pk32.fp6.f16(<32 x half> [[TMP12]], float [[TMP13]]) // CHECK-NEXT:[[TMP15:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT6_ADDR]], align 8 // CHECK-NEXT:store <6 x i32> [[TMP14]], ptr addrspace(1) [[TMP15]], align 32 +// CHECK-NEXT:[[TMP16:%.*]] = load <16 x flo
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. (PR #117591)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117591 Co-authored-by: Pravin Jagtap >From 145c4c8611307f4039f390a1a69fad4fe4c14ee3 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Mon, 8 Apr 2024 01:53:50 -0400 Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk32_{bf|f}16_{bf|fp}6 of gfx950. Co-authored-by: Pravin Jagtap --- llvm/lib/Target/AMDGPU/SIInstrInfo.td | 1 + llvm/lib/Target/AMDGPU/VOP3Instructions.td| 8 llvm/test/MC/AMDGPU/gfx950_asm_features.s | 22 - llvm/test/MC/AMDGPU/gfx950_err.s | 48 +++ .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 12 + 5 files changed, 90 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.td b/llvm/lib/Target/AMDGPU/SIInstrInfo.td index f20d6526e20b2c..ea36347423c57c 100644 --- a/llvm/lib/Target/AMDGPU/SIInstrInfo.td +++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.td @@ -1697,6 +1697,7 @@ class getVALUDstForVT { VOPDstOperand_t16Lo128), VOPDstOperand); RegisterOperand ret = !cond(!eq(VT.Size, 1024) : VOPDstOperand, + !eq(VT.Size, 512) : VOPDstOperand, !eq(VT.Size, 256) : VOPDstOperand, !eq(VT.Size, 128) : VOPDstOperand, !eq(VT.Size, 64) : VOPDstOperand, diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index 1009f2d9593609..554aff7082010a 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -966,6 +966,10 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 in let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 0 in { defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3Inst<"v_cvt_scalef32_pk32_f32_fp6", VOP3_CVT_SCALEF32_PK_F864_Profile>; defm V_CVT_SCALEF32_PK32_F32_BF6 : VOP3Inst<"v_cvt_scalef32_pk32_f32_bf6", VOP3_CVT_SCALEF32_PK_F864_Profile>; + defm V_CVT_SCALEF32_PK32_F16_FP6 : VOP3Inst<"v_cvt_scalef32_pk32_f16_fp6", VOP3_CVT_SCALEF32_PK_F864_Profile>; + defm V_CVT_SCALEF32_PK32_BF16_FP6 : VOP3Inst<"v_cvt_scalef32_pk32_bf16_fp6", VOP3_CVT_SCALEF32_PK_F864_Profile>; + defm V_CVT_SCALEF32_PK32_F16_BF6 : VOP3Inst<"v_cvt_scalef32_pk32_f16_bf6", VOP3_CVT_SCALEF32_PK_F864_Profile>; + defm V_CVT_SCALEF32_PK32_BF16_BF6 : VOP3Inst<"v_cvt_scalef32_pk32_bf16_bf6", VOP3_CVT_SCALEF32_PK_F864_Profile>; } let SubtargetPredicate = isGFX10Plus in { @@ -1915,4 +1919,8 @@ defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 <0x251>; let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in { defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, "v_cvt_scalef32_pk32_f32_fp6">; defm V_CVT_SCALEF32_PK32_F32_BF6 : VOP3_Real_gfx9<0x257, "v_cvt_scalef32_pk32_f32_bf6">; +defm V_CVT_SCALEF32_PK32_F16_FP6 : VOP3_Real_gfx9<0x260, "v_cvt_scalef32_pk32_f16_fp6">; +defm V_CVT_SCALEF32_PK32_BF16_FP6 : VOP3_Real_gfx9<0x261, "v_cvt_scalef32_pk32_bf16_fp6">; +defm V_CVT_SCALEF32_PK32_F16_BF6 : VOP3_Real_gfx9<0x262, "v_cvt_scalef32_pk32_f16_bf6">; +defm V_CVT_SCALEF32_PK32_BF16_BF6 : VOP3_Real_gfx9<0x263, "v_cvt_scalef32_pk32_bf16_bf6">; } diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s b/llvm/test/MC/AMDGPU/gfx950_asm_features.s index 95d31d2293075f..271ad4d62c3a43 100644 --- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s @@ -892,4 +892,24 @@ v_cvt_scalef32_pk32_f32_fp6 v[2:33], v[2:7], v6 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: // GFX950: v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6 ; encoding: [0x02,0x00,0x57,0xd2,0x02,0x0d,0x02,0x00] -v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6 \ No newline at end of file +v_cvt_scalef32_pk32_f32_bf6 v[2:33], v[2:7], v6 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 ; encoding: [0x0a,0x00,0x63,0xd2,0x14,0x11,0x02,0x00] +v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 ; encoding: [0x0a,0x00,0x63,0xd2,0x14,0x11,0x02,0x00] +v_cvt_scalef32_pk32_bf16_bf6 v[10:25], v[20:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk32_f16_bf6 v[10:25], v[20:25], v8 ; encoding: [0x0a,0x00,0x62,0xd2,0x14,0x11,0x02,0x00] +v_cvt_scalef32_pk32_f16_bf6 v[10:25], v[20:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk32_bf16_fp6 v[10:25], v[20:25], v8 ; encoding: [0x0a,0x00,0x61,0xd2,0x14,0x11,0x02,0x00] +v_cvt_scalef32_pk32_bf16_fp6 v[10:25], v[20:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk32_f16_fp6 v[10:25], v[20:25], v8 ; encoding: [0x0a,0x00,0x60,0xd2,0x14,0x11,0x02,0x00] +v_cvt_scalef32_pk32_f16_fp6 v[10:25
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
https://github.com/arsenm created https://github.com/llvm/llvm-project/pull/117594 These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. Co-authored-by: Pravin Jagtap >From a87b139e074e856cd0c61ef61e8f092feff6bff6 Mon Sep 17 00:00:00 2001 From: Pravin Jagtap Date: Wed, 10 Apr 2024 05:47:54 -0400 Subject: [PATCH] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. Co-authored-by: Pravin Jagtap --- .../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 4 +- llvm/lib/Target/AMDGPU/VOP3Instructions.td| 26 llvm/test/MC/AMDGPU/gfx950_asm_features.s | 40 +++ llvm/test/MC/AMDGPU/gfx950_err.s | 24 +++ .../Disassembler/AMDGPU/gfx950_dasm_vop3.txt | 30 ++ 5 files changed, 123 insertions(+), 1 deletion(-) diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index a1d45822837c5f..afd35842ba87f4 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -8824,7 +8824,9 @@ void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const OperandVector &Operands, const bool IsPacked = (Desc.TSFlags & SIInstrFlags::IsPacked) != 0; - if (Opc == AMDGPU::V_CVT_SR_BF8_F32_vi || + if (Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_F16_vi || + Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_BF16_vi || + Opc == AMDGPU::V_CVT_SR_BF8_F32_vi || Opc == AMDGPU::V_CVT_SR_FP8_F32_vi || Opc == AMDGPU::V_CVT_SR_BF8_F32_gfx12_e64_gfx12 || Opc == AMDGPU::V_CVT_SR_FP8_F32_gfx12_e64_gfx12) { diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index fdffb2c36dcccf..7776688156419a 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -899,6 +899,23 @@ def VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile : VOP3_Profile, + VOP3_OPSEL> { + let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0, + FP32InputMods:$src1_modifiers, Src1RC64:$src1, + FP32InputMods:$src2_modifiers, VGPR_32:$src2, + op_sel0:$op_sel); + let HasClamp = 0; + let HasSrc2 = 0; + let HasSrc2Mods = 1; + let HasOpSel = 1; + let AsmVOP3OpSel = !subst(", $src2_modifiers", "", +getAsmVOP3OpSel<3, HasClamp, HasOMod, +HasSrc0FloatMods, HasSrc1FloatMods, +HasSrc2FloatMods>.ret); + let HasExtVOP3DPP = 0; +} + class VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile : VOP3_Profile, VOP3_OPSEL> { let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0, @@ -965,6 +982,13 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f32", VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile>; defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_f16_fp4", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp4", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + + // These instructions have non-standard use of op_sel. In particular they are + // using op_sel bits 2 and 3 while only having two sources. + let Constraints = "$vdst = $src2", DisableEncoding = "$src2" in { +defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f16", VOP3_CVT_SCALE_FP4_F16BF16_Profile>; +defm V_CVT_SCALEF32_PK_FP4_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_bf16", VOP3_CVT_SCALE_FP4_F16BF16_Profile>; + } } let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -1930,6 +1954,8 @@ defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>; defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3OpSel_Real_gfx9 <0x23d>; defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3OpSel_Real_gfx9 <0x250>; defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 <0x251>; +defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3OpSel_Real_gfx9_forced_opsel2 <0x24c>; +defm V_CVT_SCALEF32_PK_FP4_BF16: VOP3OpSel_Real_gfx9_forced_opsel2 <0x24d>; } let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in { defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, "v_cvt_scalef32_pk32_f32_fp6">; diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s b/llvm/test/MC/AMDGPU/gfx950_asm_features.s index e505b6ff4ad58b..12340dfaa78e91 100644 --- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes These instructions have non-standard use of OPSEL bits to select dest write byte. The src2_modifiers operand is used without having its corresponding src2 operand by introducing dummy src2. Co-authored-by: Pravin Jagtap--- Full diff: https://github.com/llvm/llvm-project/pull/117594.diff 5 Files Affected: - (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+3-1) - (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+26) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+40) - (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+24) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+30) ``diff diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp index a1d45822837c5f..afd35842ba87f4 100644 --- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp +++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp @@ -8824,7 +8824,9 @@ void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const OperandVector &Operands, const bool IsPacked = (Desc.TSFlags & SIInstrFlags::IsPacked) != 0; - if (Opc == AMDGPU::V_CVT_SR_BF8_F32_vi || + if (Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_F16_vi || + Opc == AMDGPU::V_CVT_SCALEF32_PK_FP4_BF16_vi || + Opc == AMDGPU::V_CVT_SR_BF8_F32_vi || Opc == AMDGPU::V_CVT_SR_FP8_F32_vi || Opc == AMDGPU::V_CVT_SR_BF8_F32_gfx12_e64_gfx12 || Opc == AMDGPU::V_CVT_SR_FP8_F32_gfx12_e64_gfx12) { diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index fdffb2c36dcccf..7776688156419a 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -899,6 +899,23 @@ def VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile : VOP3_Profile, + VOP3_OPSEL> { + let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0, + FP32InputMods:$src1_modifiers, Src1RC64:$src1, + FP32InputMods:$src2_modifiers, VGPR_32:$src2, + op_sel0:$op_sel); + let HasClamp = 0; + let HasSrc2 = 0; + let HasSrc2Mods = 1; + let HasOpSel = 1; + let AsmVOP3OpSel = !subst(", $src2_modifiers", "", +getAsmVOP3OpSel<3, HasClamp, HasOMod, +HasSrc0FloatMods, HasSrc1FloatMods, +HasSrc2FloatMods>.ret); + let HasExtVOP3DPP = 0; +} + class VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile : VOP3_Profile, VOP3_OPSEL> { let InsVOP3OpSel = (ins FP32InputMods:$src0_modifiers, Src0RC64:$src0, @@ -965,6 +982,13 @@ let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f32", VOP3_CVT_SCALE_FP4FP8BF8_F32_Profile>; defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_f16_fp4", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp4", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + + // These instructions have non-standard use of op_sel. In particular they are + // using op_sel bits 2 and 3 while only having two sources. + let Constraints = "$vdst = $src2", DisableEncoding = "$src2" in { +defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_f16", VOP3_CVT_SCALE_FP4_F16BF16_Profile>; +defm V_CVT_SCALEF32_PK_FP4_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp4_bf16", VOP3_CVT_SCALE_FP4_F16BF16_Profile>; + } } let SubtargetPredicate = HasFP6BF6ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -1930,6 +1954,8 @@ defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>; defm V_CVT_SCALEF32_PK_FP4_F32 : VOP3OpSel_Real_gfx9 <0x23d>; defm V_CVT_SCALEF32_PK_F16_FP4 : VOP3OpSel_Real_gfx9 <0x250>; defm V_CVT_SCALEF32_PK_BF16_FP4 : VOP3OpSel_Real_gfx9 <0x251>; +defm V_CVT_SCALEF32_PK_FP4_F16 : VOP3OpSel_Real_gfx9_forced_opsel2 <0x24c>; +defm V_CVT_SCALEF32_PK_FP4_BF16: VOP3OpSel_Real_gfx9_forced_opsel2 <0x24d>; } let OtherPredicates = [HasFP6BF6ConversionScaleInsts] in { defm V_CVT_SCALEF32_PK32_F32_FP6 : VOP3_Real_gfx9<0x256, "v_cvt_scalef32_pk32_f32_fp6">; diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s b/llvm/test/MC/AMDGPU/gfx950_asm_features.s index e505b6ff4ad58b..12340dfaa78e91 100644 --- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s @@ -1025,3 +1025,43 @@ v_cvt_scalef32_pk_bf16_bf8 v1, v2, s3 op_sel:[1,0,0] // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: // GFX950: v_cvt_scalef32_pk_bf16_bf8 v1, s2, 3 op_sel:[1,0,0] ; encoding: [0x01,0x08,0x6a,0xd2,0x02,0x06,0x01,0x00] v_cvt_scalef32_pk_bf16_bf8 v1, s2, 3 op_sel:[1,0,0] + +// NOT-GFX950: error: instru
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_fp4_{f|bf}16 on gfx950. (PR #117594)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117594 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_pk32_{bf|f}6_{bf|fp}16 for gfx950 (PR #117592)
llvmbot wrote: @llvm/pr-subscribers-clang Author: Matt Arsenault (arsenm) Changes Co-authored-by: Pravin Jagtap--- Patch is 49.45 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117592.diff 16 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+4) - (modified) clang/test/CodeGenOpenCL/amdgpu-features.cl (+1-1) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+43) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+9) - (modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+16-1) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+4) - (modified) llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h (+3) - (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+1) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+6-1) - (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+14) - (modified) llvm/lib/TargetParser/TargetParser.cpp (+1) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.ll (+474) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+16) - (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+48) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+12) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index a42ad56ce4f998..e09dc0e1107a82 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -559,6 +559,10 @@ TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_fp8_w64, "V4fiV2iV4fs", TARGET_BUILTIN(__builtin_amdgcn_swmmac_f32_16x16x32_bf8_bf8_w64, "V4fiV2iV4fs", "nc", "gfx12-insts,wavefrontsize64") TARGET_BUILTIN(__builtin_amdgcn_prng_b32, "UiUi", "nc", "prng-inst") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_f16, "V6UiV32hf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_f16, "V6UiV32hf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_fp6_bf16, "V6UiV32yf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_pk32_bf6_bf16, "V6UiV32yf", "nc", "f16bf16-to-fp6bf6-cvt-scale-insts") #undef BUILTIN #undef TARGET_BUILTIN diff --git a/clang/test/CodeGenOpenCL/amdgpu-features.cl b/clang/test/CodeGenOpenCL/amdgpu-features.cl index f9e07fbc6b0480..56013dad9b6651 100644 --- a/clang/test/CodeGenOpenCL/amdgpu-features.cl +++ b/clang/test/CodeGenOpenCL/amdgpu-features.cl @@ -89,7 +89,7 @@ // GFX941: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX942: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xf32-insts" // GFX9_4_Generic: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" -// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-insts,+gfx950-insts,+mai-insts,+permlane16-swap,+permlane32-swap,+prng-inst,+s-memrealtime,+s-memtime-inst,+wavefrontsize64" +// GFX950: "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-ds-pk-add-16-insts,+atomic-fadd-rtn-insts,+atomic-flat-pk-add-16-insts,+atomic-global-pk-add-bf16-inst,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+f16bf16-to-fp6bf6-cvt-scale-insts,+fp8-conversion-insts,+fp8-insts,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+gfx940-i
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes Scale packed 16-component single-precision float vectors from two source inputs using the exponent provided by the third single-precision float input, then convert the values to a packed 32-component FP6 float value. Co-authored-by: Pravin Jagtap--- Patch is 22.01 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/117595.diff 10 Files Affected: - (modified) clang/include/clang/Basic/BuiltinsAMDGPU.def (+3) - (modified) clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl (+21-1) - (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+6) - (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+2) - (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.td (+1) - (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) - (added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.scalef32.pk.gfx950.ll (+128) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+24) - (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+24) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+18) ``diff diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def b/clang/include/clang/Basic/BuiltinsAMDGPU.def index e09dc0e1107a82..dacbf5aa902f60 100644 --- a/clang/include/clang/Basic/BuiltinsAMDGPU.def +++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def @@ -467,6 +467,9 @@ TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr6_b96_v3i32, "V3iV3i*3", "nc", "gfx950 TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr8_b64_v2i32, "V2iV2i*3", "nc", "gfx950-insts") TARGET_BUILTIN(__builtin_amdgcn_ds_read_tr16_b64_v4i16, "V4sV4s*3", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_fp6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") +TARGET_BUILTIN(__builtin_amdgcn_cvt_scalef32_2xpk16_bf6_f32, "V6UiV16fV16ff", "nc", "gfx950-insts") + //===--===// // GFX12+ only builtins. //===--===// diff --git a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl index 779aadd96f3f41..6f3c81b26be0b8 100644 --- a/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl +++ b/clang/test/CodeGenOpenCL/builtins-amdgcn-gfx950.cl @@ -7,6 +7,8 @@ typedef unsigned int __attribute__((ext_vector_type(2))) uint2; typedef unsigned int __attribute__((ext_vector_type(6))) uint6; typedef __bf16 __attribute__((ext_vector_type(32))) bfloat32; typedef half __attribute__((ext_vector_type(32))) half32; +typedef short __attribute__((ext_vector_type(2))) short2; +typedef float __attribute__((ext_vector_type(16))) float16; // CHECK-LABEL: @test_prng_b32( // CHECK-NEXT: entry: @@ -115,10 +117,14 @@ void test_permlane32_swap(global uint2* out, uint old, uint src) { // CHECK-NEXT:[[OUT6_ADDR:%.*]] = alloca ptr addrspace(1), align 8, addrspace(5) // CHECK-NEXT:[[SRCBF32_ADDR:%.*]] = alloca <32 x bfloat>, align 64, addrspace(5) // CHECK-NEXT:[[SRCH32_ADDR:%.*]] = alloca <32 x half>, align 64, addrspace(5) +// CHECK-NEXT:[[SRC0F32_ADDR:%.*]] = alloca <16 x float>, align 64, addrspace(5) +// CHECK-NEXT:[[SRC1F32_ADDR:%.*]] = alloca <16 x float>, align 64, addrspace(5) // CHECK-NEXT:[[SCALE_ADDR:%.*]] = alloca float, align 4, addrspace(5) // CHECK-NEXT:store ptr addrspace(1) [[OUT6:%.*]], ptr addrspace(5) [[OUT6_ADDR]], align 8 // CHECK-NEXT:store <32 x bfloat> [[SRCBF32:%.*]], ptr addrspace(5) [[SRCBF32_ADDR]], align 64 // CHECK-NEXT:store <32 x half> [[SRCH32:%.*]], ptr addrspace(5) [[SRCH32_ADDR]], align 64 +// CHECK-NEXT:store <16 x float> [[SRC0F32:%.*]], ptr addrspace(5) [[SRC0F32_ADDR]], align 64 +// CHECK-NEXT:store <16 x float> [[SRC1F32:%.*]], ptr addrspace(5) [[SRC1F32_ADDR]], align 64 // CHECK-NEXT:store float [[SCALE:%.*]], ptr addrspace(5) [[SCALE_ADDR]], align 4 // CHECK-NEXT:[[TMP0:%.*]] = load <32 x bfloat>, ptr addrspace(5) [[SRCBF32_ADDR]], align 64 // CHECK-NEXT:[[TMP1:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], align 4 @@ -140,12 +146,26 @@ void test_permlane32_swap(global uint2* out, uint old, uint src) { // CHECK-NEXT:[[TMP14:%.*]] = call <6 x i32> @llvm.amdgcn.cvt.scalef32.pk32.fp6.f16(<32 x half> [[TMP12]], float [[TMP13]]) // CHECK-NEXT:[[TMP15:%.*]] = load ptr addrspace(1), ptr addrspace(5) [[OUT6_ADDR]], align 8 // CHECK-NEXT:store <6 x i32> [[TMP14]], ptr addrspace(1) [[TMP15]], align 32 +// CHECK-NEXT:[[TMP16:%.*]] = load <16 x float>, ptr addrspace(5) [[SRC0F32_ADDR]], align 64 +// CHECK-NEXT:[[TMP17:%.*]] = load <16 x float>, ptr addrspace(5) [[SRC1F32_ADDR]], align 64 +// CHECK-NEXT:[[TMP18:%.*]] = load float, ptr addrspace(5) [[SCALE_ADDR]], align 4 +// CHECK-NEXT:[[TMP19:%.*]] = call <6 x i32> @llvm.amdgcn.cvt.scalef32.2xpk16.bf6.
[llvm-branch-commits] [llvm] AMDGPU: MC support for v_cvt_scalef32_pk_{bf|f}16_{bf|fp}8 of gfx950. (PR #117593)
llvmbot wrote: @llvm/pr-subscribers-backend-amdgpu Author: Matt Arsenault (arsenm) Changes OPSEL[0] selects src_word to read. Co-authored-by: Pravin Jagtap--- Full diff: https://github.com/llvm/llvm-project/pull/117593.diff 4 Files Affected: - (modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+8) - (modified) llvm/test/MC/AMDGPU/gfx950_asm_features.s (+96) - (modified) llvm/test/MC/AMDGPU/gfx950_err.s (+49-1) - (modified) llvm/test/MC/Disassembler/AMDGPU/gfx950_dasm_vop3.txt (+72) ``diff diff --git a/llvm/lib/Target/AMDGPU/VOP3Instructions.td b/llvm/lib/Target/AMDGPU/VOP3Instructions.td index 764a2275205665..fdffb2c36dcccf 100644 --- a/llvm/lib/Target/AMDGPU/VOP3Instructions.td +++ b/llvm/lib/Target/AMDGPU/VOP3Instructions.td @@ -945,6 +945,8 @@ let SubtargetPredicate = HasFP8ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3Inst<"v_cvt_scalef32_pk_f32_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_f16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; defm V_CVT_SCALEF32_PK_FP8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_fp8_bf16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; + defm V_CVT_SCALEF32_PK_F16_FP8: VOP3Inst<"v_cvt_scalef32_pk_f16_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3Inst<"v_cvt_scalef32_pk_bf16_fp8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; } let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -954,6 +956,8 @@ let SubtargetPredicate = HasBF8ConversionScaleInsts, mayRaiseFPException = 0 in defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3Inst<"v_cvt_scalef32_pk_f32_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_f16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; defm V_CVT_SCALEF32_PK_BF8_BF16 : VOP3Inst<"v_cvt_scalef32_pk_bf8_bf16", VOP3_CVT_SCALE_PK_FP8BF8_F16BF16_Profile>; + defm V_CVT_SCALEF32_PK_F16_BF8: VOP3Inst<"v_cvt_scalef32_pk_f16_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; + defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3Inst<"v_cvt_scalef32_pk_bf16_bf8", VOP3_CVT_SCALE_PK_F16BF16F32_FP4FP8BF8_Profile>; } let SubtargetPredicate = HasFP4ConversionScaleInsts, mayRaiseFPException = 0 in { @@ -1908,6 +1912,8 @@ defm V_CVT_SCALEF32_PK_FP8_F32 : VOP3OpSel_Real_gfx9 <0x235>; defm V_CVT_SCALEF32_PK_F32_FP8 : VOP3OpSel_Real_gfx9 <0x239>; defm V_CVT_SCALEF32_PK_FP8_F16 : VOP3OpSel_Real_gfx9 <0x240>; defm V_CVT_SCALEF32_PK_FP8_BF16: VOP3OpSel_Real_gfx9 <0x244>; +defm V_CVT_SCALEF32_PK_F16_FP8 : VOP3OpSel_Real_gfx9<0x248>; +defm V_CVT_SCALEF32_PK_BF16_FP8 : VOP3OpSel_Real_gfx9<0x269>; } let OtherPredicates = [HasBF8ConversionScaleInsts] in { defm V_CVT_SCALEF32_F16_BF8 : VOP3OpSel_Real_gfx9 <0x24b>; @@ -1916,6 +1922,8 @@ defm V_CVT_SCALEF32_PK_BF8_F32 : VOP3OpSel_Real_gfx9 <0x236>; defm V_CVT_SCALEF32_PK_F32_BF8 : VOP3OpSel_Real_gfx9 <0x23a>; defm V_CVT_SCALEF32_PK_BF8_F16 : VOP3OpSel_Real_gfx9 <0x241>; defm V_CVT_SCALEF32_PK_BF8_BF16: VOP3OpSel_Real_gfx9 <0x245>; +defm V_CVT_SCALEF32_PK_F16_BF8 : VOP3OpSel_Real_gfx9<0x249>; +defm V_CVT_SCALEF32_PK_BF16_BF8 : VOP3OpSel_Real_gfx9<0x26a>; } let OtherPredicates = [HasFP4ConversionScaleInsts] in { defm V_CVT_SCALEF32_PK_F32_FP4 : VOP3OpSel_Real_gfx9 <0x23f>; diff --git a/llvm/test/MC/AMDGPU/gfx950_asm_features.s b/llvm/test/MC/AMDGPU/gfx950_asm_features.s index 1aef267537aa55..e505b6ff4ad58b 100644 --- a/llvm/test/MC/AMDGPU/gfx950_asm_features.s +++ b/llvm/test/MC/AMDGPU/gfx950_asm_features.s @@ -929,3 +929,99 @@ v_cvt_scalef32_pk32_fp6_bf16 v[20:25], v[10:25], v8 // NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: // GFX950: v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 ; encoding: [0x14,0x00,0x58,0xd2,0x0a,0x11,0x02,0x00] v_cvt_scalef32_pk32_fp6_f16 v[20:25], v[10:25], v8 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x02,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3; encoding: [0x01,0x00,0x48,0xd2,0x02,0x07,0x00,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 ; encoding: [0x01,0x00,0x48,0xd2,0x02,0x06,0x01,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, s2, 3 + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] ; encoding: [0x01,0x08,0x48,0xd2,0x02,0x07,0x02,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, v3 op_sel:[1,0,0] + +// NOT-GFX950: :[[@LINE+2]]:{{[0-9]+}}: error: +// GFX950: v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 op_sel:[1,0,0] ; encoding: [0x01,0x08,0x48,0xd2,0x02,0x07,0x00,0x00] +v_cvt_scalef32_pk_f16_fp8 v1, v2, s3 o
[llvm-branch-commits] [clang] [llvm] AMDGPU: Support v_cvt_scalef32_2xpk16_{bf|fp}6_f32 for gfx950. (PR #117595)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/117595 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits