[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)
cdevadas wrote: ### Merge activity * **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96162). https://github.com/llvm/llvm-project/pull/96162 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)
cdevadas wrote: ### Merge activity * **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this pull request via [Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96163). https://github.com/llvm/llvm-project/pull/96163 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 32ec3b4 - Revert "[llvm-cgdata] Remove `GENERATE_DRIVER` option (#100066)"
Author: Petr Hosek Date: 2024-07-23T01:18:55-07:00 New Revision: 32ec3b4f547f9af8cd2af736cd7c00843ef69a93 URL: https://github.com/llvm/llvm-project/commit/32ec3b4f547f9af8cd2af736cd7c00843ef69a93 DIFF: https://github.com/llvm/llvm-project/commit/32ec3b4f547f9af8cd2af736cd7c00843ef69a93.diff LOG: Revert "[llvm-cgdata] Remove `GENERATE_DRIVER` option (#100066)" This reverts commit 96d412135395a251f2931b8fca4dd8150aeed9ba. Added: Modified: llvm/tools/llvm-cgdata/CMakeLists.txt Removed: diff --git a/llvm/tools/llvm-cgdata/CMakeLists.txt b/llvm/tools/llvm-cgdata/CMakeLists.txt index 966384278b9ab..4f1f7ff635bc3 100644 --- a/llvm/tools/llvm-cgdata/CMakeLists.txt +++ b/llvm/tools/llvm-cgdata/CMakeLists.txt @@ -11,4 +11,5 @@ add_llvm_tool(llvm-cgdata DEPENDS intrinsics_gen + GENERATE_DRIVER ) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] c2dbaeb - Bump version to 19.1.0git
Author: Tobias Hieta Date: 2024-07-23T11:06:16+02:00 New Revision: c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629 URL: https://github.com/llvm/llvm-project/commit/c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629 DIFF: https://github.com/llvm/llvm-project/commit/c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629.diff LOG: Bump version to 19.1.0git Added: Modified: cmake/Modules/LLVMVersion.cmake libcxx/include/__config llvm/utils/gn/secondary/llvm/version.gni llvm/utils/lit/lit/__init__.py Removed: diff --git a/cmake/Modules/LLVMVersion.cmake b/cmake/Modules/LLVMVersion.cmake index 5e28283fbc1c6..aea9b880180ab 100644 --- a/cmake/Modules/LLVMVersion.cmake +++ b/cmake/Modules/LLVMVersion.cmake @@ -4,7 +4,7 @@ if(NOT DEFINED LLVM_VERSION_MAJOR) set(LLVM_VERSION_MAJOR 19) endif() if(NOT DEFINED LLVM_VERSION_MINOR) - set(LLVM_VERSION_MINOR 0) + set(LLVM_VERSION_MINOR 1) endif() if(NOT DEFINED LLVM_VERSION_PATCH) set(LLVM_VERSION_PATCH 0) diff --git a/libcxx/include/__config b/libcxx/include/__config index 108f700823cbf..661af5be3c225 100644 --- a/libcxx/include/__config +++ b/libcxx/include/__config @@ -27,7 +27,7 @@ // _LIBCPP_VERSION represents the version of libc++, which matches the version of LLVM. // Given a LLVM release LLVM XX.YY.ZZ (e.g. LLVM 17.0.1 == 17.00.01), _LIBCPP_VERSION is // defined to XXYYZZ. -# define _LIBCPP_VERSION 19 +# define _LIBCPP_VERSION 190100 # define _LIBCPP_CONCAT_IMPL(_X, _Y) _X##_Y # define _LIBCPP_CONCAT(_X, _Y) _LIBCPP_CONCAT_IMPL(_X, _Y) diff --git a/llvm/utils/gn/secondary/llvm/version.gni b/llvm/utils/gn/secondary/llvm/version.gni index 7c02ed396db5f..3f44a4645acf6 100644 --- a/llvm/utils/gn/secondary/llvm/version.gni +++ b/llvm/utils/gn/secondary/llvm/version.gni @@ -1,4 +1,4 @@ llvm_version_major = 19 -llvm_version_minor = 0 +llvm_version_minor = 1 llvm_version_patch = 0 llvm_version = "$llvm_version_major.$llvm_version_minor.$llvm_version_patch" diff --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py index a5a1ff66bf417..03edfc3360972 100644 --- a/llvm/utils/lit/lit/__init__.py +++ b/llvm/utils/lit/lit/__init__.py @@ -2,7 +2,7 @@ __author__ = "Daniel Dunbar" __email__ = "dan...@minormatter.com" -__versioninfo__ = (19, 0, 0) +__versioninfo__ = (19, 1, 0) __version__ = ".".join(str(v) for v in __versioninfo__) + "dev" __all__ = [] ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100094 Backport b48819dbcdb48fc737dc22304ac343e4fdbae9ff Requested by: @nikic >From 36301dee358a56dbf3b79ab748444e364d0cb382 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Tue, 23 Jul 2024 12:00:53 +0200 Subject: [PATCH] Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" This reverts commit f2ccf80136a01ca69f766becafb329db6c54c0c8. The flag propagation code is incorrect. (cherry picked from commit b48819dbcdb48fc737dc22304ac343e4fdbae9ff) --- llvm/lib/Transforms/Scalar/LICM.cpp | 62 llvm/test/CodeGen/PowerPC/common-chain.ll | 315 +- llvm/test/CodeGen/PowerPC/p10-spill-crlt.ll | 16 +- llvm/test/Transforms/LICM/hoist-binop.ll | 99 -- llvm/test/Transforms/LICM/sink-foldable.ll| 4 +- .../LICM/update-scev-after-hoist.ll | 2 +- 6 files changed, 163 insertions(+), 335 deletions(-) delete mode 100644 llvm/test/Transforms/LICM/hoist-binop.ll diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index fe264503dee9e..91ef2b4b7c183 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -113,8 +113,6 @@ STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP expressions " STATISTIC(NumIntAssociationsHoisted, "Number of invariant int expressions " "reassociated and hoisted out of the loop"); -STATISTIC(NumBOAssociationsHoisted, "Number of invariant BinaryOp expressions " -"reassociated and hoisted out of the loop"); /// Memory promotion is enabled by default. static cl::opt @@ -2781,60 +2779,6 @@ static bool hoistMulAddAssociation(Instruction &I, Loop &L, return true; } -/// Reassociate general associative binary expressions of the form -/// -/// 1. "(LV op C1) op C2" ==> "LV op (C1 op C2)" -/// -/// where op is an associative binary op, LV is a loop variant, and C1 and C2 -/// are loop invariants that we want to hoist. -/// -/// TODO: This can be extended to more cases such as -/// 2. "C1 op (C2 op LV)" ==> "(C1 op C2) op LV" -/// 3. "(C1 op LV) op C2" ==> "LV op (C1 op C2)" if op is commutative -/// 4. "C1 op (LV op C2)" ==> "(C1 op C2) op LV" if op is commutative -static bool hoistBOAssociation(Instruction &I, Loop &L, - ICFLoopSafetyInfo &SafetyInfo, - MemorySSAUpdater &MSSAU, AssumptionCache *AC, - DominatorTree *DT) { - BinaryOperator *BO = dyn_cast(&I); - if (!BO || !BO->isAssociative()) -return false; - - Instruction::BinaryOps Opcode = BO->getOpcode(); - BinaryOperator *Op0 = dyn_cast(BO->getOperand(0)); - - // Transform: "(LV op C1) op C2" ==> "LV op (C1 op C2)" - if (Op0 && Op0->getOpcode() == Opcode) { -Value *LV = Op0->getOperand(0); -Value *C1 = Op0->getOperand(1); -Value *C2 = BO->getOperand(1); - -if (L.isLoopInvariant(LV) || !L.isLoopInvariant(C1) || -!L.isLoopInvariant(C2)) - return false; - -auto *Preheader = L.getLoopPreheader(); -assert(Preheader && "Loop is not in simplify form?"); -IRBuilder<> Builder(Preheader->getTerminator()); -Value *Inv = Builder.CreateBinOp(Opcode, C1, C2, "invariant.op"); - -auto *NewBO = -BinaryOperator::Create(Opcode, LV, Inv, BO->getName() + ".reass", BO); -NewBO->copyIRFlags(BO); -BO->replaceAllUsesWith(NewBO); -eraseInstruction(*BO, SafetyInfo, MSSAU); - -// Note: (LV op C1) might not be erased if it has more uses than the one we -// just replaced. -if (Op0->use_empty()) - eraseInstruction(*Op0, SafetyInfo, MSSAU); - -return true; - } - - return false; -} - static bool hoistArithmetics(Instruction &I, Loop &L, ICFLoopSafetyInfo &SafetyInfo, MemorySSAUpdater &MSSAU, AssumptionCache *AC, @@ -2872,12 +2816,6 @@ static bool hoistArithmetics(Instruction &I, Loop &L, return true; } - if (hoistBOAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) { -++NumHoisted; -++NumBOAssociationsHoisted; -return true; - } - return false; } diff --git a/llvm/test/CodeGen/PowerPC/common-chain.ll b/llvm/test/CodeGen/PowerPC/common-chain.ll index ccf0e4520f468..5f8c21e30f8fd 100644 --- a/llvm/test/CodeGen/PowerPC/common-chain.ll +++ b/llvm/test/CodeGen/PowerPC/common-chain.ll @@ -642,8 +642,8 @@ define i64 @two_chain_two_bases_succ(ptr %p, i64 %offset, i64 %base1, i64 %base2 ; CHECK-NEXT:cmpdi r7, 0 ; CHECK-NEXT:ble cr0, .LBB6_4 ; CHECK-NEXT: # %bb.1: # %for.body.preheader -; CHECK-NEXT:add r5, r5, r4 ; CHECK-NEXT:add r6, r6, r4 +; CHECK-NEXT:add r5, r5, r4 ; CHECK-NEXT:mtctr r7 ; CHECK-NEXT:sldi r4, r4, 1 ; CHECK-NEXT:add r5, r3, r5 @@ -743,219 +743,214 @@ define signext i32 @spill_reduce_succ(ptr %input1, ptr %input2, ptr %output, i6
[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100094 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)
llvmbot wrote: @llvm/pr-subscribers-backend-powerpc Author: None (llvmbot) Changes Backport b48819dbcdb48fc737dc22304ac343e4fdbae9ff Requested by: @nikic --- Patch is 26.62 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100094.diff 6 Files Affected: - (modified) llvm/lib/Transforms/Scalar/LICM.cpp (-62) - (modified) llvm/test/CodeGen/PowerPC/common-chain.ll (+155-160) - (modified) llvm/test/CodeGen/PowerPC/p10-spill-crlt.ll (+5-11) - (removed) llvm/test/Transforms/LICM/hoist-binop.ll (-99) - (modified) llvm/test/Transforms/LICM/sink-foldable.ll (+2-2) - (modified) llvm/test/Transforms/LICM/update-scev-after-hoist.ll (+1-1) ``diff diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp b/llvm/lib/Transforms/Scalar/LICM.cpp index fe264503dee9e..91ef2b4b7c183 100644 --- a/llvm/lib/Transforms/Scalar/LICM.cpp +++ b/llvm/lib/Transforms/Scalar/LICM.cpp @@ -113,8 +113,6 @@ STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP expressions " STATISTIC(NumIntAssociationsHoisted, "Number of invariant int expressions " "reassociated and hoisted out of the loop"); -STATISTIC(NumBOAssociationsHoisted, "Number of invariant BinaryOp expressions " -"reassociated and hoisted out of the loop"); /// Memory promotion is enabled by default. static cl::opt @@ -2781,60 +2779,6 @@ static bool hoistMulAddAssociation(Instruction &I, Loop &L, return true; } -/// Reassociate general associative binary expressions of the form -/// -/// 1. "(LV op C1) op C2" ==> "LV op (C1 op C2)" -/// -/// where op is an associative binary op, LV is a loop variant, and C1 and C2 -/// are loop invariants that we want to hoist. -/// -/// TODO: This can be extended to more cases such as -/// 2. "C1 op (C2 op LV)" ==> "(C1 op C2) op LV" -/// 3. "(C1 op LV) op C2" ==> "LV op (C1 op C2)" if op is commutative -/// 4. "C1 op (LV op C2)" ==> "(C1 op C2) op LV" if op is commutative -static bool hoistBOAssociation(Instruction &I, Loop &L, - ICFLoopSafetyInfo &SafetyInfo, - MemorySSAUpdater &MSSAU, AssumptionCache *AC, - DominatorTree *DT) { - BinaryOperator *BO = dyn_cast(&I); - if (!BO || !BO->isAssociative()) -return false; - - Instruction::BinaryOps Opcode = BO->getOpcode(); - BinaryOperator *Op0 = dyn_cast(BO->getOperand(0)); - - // Transform: "(LV op C1) op C2" ==> "LV op (C1 op C2)" - if (Op0 && Op0->getOpcode() == Opcode) { -Value *LV = Op0->getOperand(0); -Value *C1 = Op0->getOperand(1); -Value *C2 = BO->getOperand(1); - -if (L.isLoopInvariant(LV) || !L.isLoopInvariant(C1) || -!L.isLoopInvariant(C2)) - return false; - -auto *Preheader = L.getLoopPreheader(); -assert(Preheader && "Loop is not in simplify form?"); -IRBuilder<> Builder(Preheader->getTerminator()); -Value *Inv = Builder.CreateBinOp(Opcode, C1, C2, "invariant.op"); - -auto *NewBO = -BinaryOperator::Create(Opcode, LV, Inv, BO->getName() + ".reass", BO); -NewBO->copyIRFlags(BO); -BO->replaceAllUsesWith(NewBO); -eraseInstruction(*BO, SafetyInfo, MSSAU); - -// Note: (LV op C1) might not be erased if it has more uses than the one we -// just replaced. -if (Op0->use_empty()) - eraseInstruction(*Op0, SafetyInfo, MSSAU); - -return true; - } - - return false; -} - static bool hoistArithmetics(Instruction &I, Loop &L, ICFLoopSafetyInfo &SafetyInfo, MemorySSAUpdater &MSSAU, AssumptionCache *AC, @@ -2872,12 +2816,6 @@ static bool hoistArithmetics(Instruction &I, Loop &L, return true; } - if (hoistBOAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) { -++NumHoisted; -++NumBOAssociationsHoisted; -return true; - } - return false; } diff --git a/llvm/test/CodeGen/PowerPC/common-chain.ll b/llvm/test/CodeGen/PowerPC/common-chain.ll index ccf0e4520f468..5f8c21e30f8fd 100644 --- a/llvm/test/CodeGen/PowerPC/common-chain.ll +++ b/llvm/test/CodeGen/PowerPC/common-chain.ll @@ -642,8 +642,8 @@ define i64 @two_chain_two_bases_succ(ptr %p, i64 %offset, i64 %base1, i64 %base2 ; CHECK-NEXT:cmpdi r7, 0 ; CHECK-NEXT:ble cr0, .LBB6_4 ; CHECK-NEXT: # %bb.1: # %for.body.preheader -; CHECK-NEXT:add r5, r5, r4 ; CHECK-NEXT:add r6, r6, r4 +; CHECK-NEXT:add r5, r5, r4 ; CHECK-NEXT:mtctr r7 ; CHECK-NEXT:sldi r4, r4, 1 ; CHECK-NEXT:add r5, r3, r5 @@ -743,219 +743,214 @@ define signext i32 @spill_reduce_succ(ptr %input1, ptr %input2, ptr %output, i64 ; CHECK-NEXT:std r9, -184(r1) # 8-byte Folded Spill ; CHECK-NEXT:std r8, -176(r1) # 8-byte Folded Spill ; CHECK-NEXT:std r7, -168(r1) # 8-byte Folded Spill -; CHECK-NEXT:std r4, -160(r1) # 8-byte Folded Spill +; CHECK-NEXT:std r3, -160(r1) # 8-byte Folded Spill ; CHECK-NEXT:ble cr0, .LBB7_7 ; C
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn milestoned https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn created https://github.com/llvm/llvm-project/pull/100097 As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. >From a72a0bf44a8b259be3c62e79082d2fdc04fc2771 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Tue, 23 Jul 2024 11:15:26 +0100 Subject: [PATCH] [LV] Disable VPlan-based cost model for 19.x release. As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 2 +- .../test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6d28b8fabe42e..68363abdb817a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold( cl::desc("The maximum allowed number of runtime memory checks")); static cl::opt UseLegacyCostModel( -"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden, +"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden, cl::desc("Use the legacy cost model instead of the VPlan-based cost model. " "This option will be removed in the future.")); diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll index fc310f4163082..1a78eaf644723 100644 --- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll @@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF @@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
llvmbot wrote: @llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) Changes As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. --- Full diff: https://github.com/llvm/llvm-project/pull/100097.diff 2 Files Affected: - (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1) - (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (-2) ``diff diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6d28b8fabe42e..68363abdb817a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold( cl::desc("The maximum allowed number of runtime memory checks")); static cl::opt UseLegacyCostModel( -"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden, +"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden, cl::desc("Use the legacy cost model instead of the VPlan-based cost model. " "This option will be removed in the future.")); diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll index fc310f4163082..1a78eaf644723 100644 --- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll @@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF @@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF `` https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/nikic approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Create `LoopRelatedClause` (PR #99506)
https://github.com/tblah approved this pull request. LGTM, thanks! https://github.com/llvm/llvm-project/pull/99506 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld][ELF][LoongArch] Support R_LARCH_TLS_{LD, GD, DESC}_PCREL_S2 (PR #100105)
https://github.com/wangleiat created https://github.com/llvm/llvm-project/pull/100105 None ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld][ELF][LoongArch] Support R_LARCH_TLS_{LD, GD, DESC}_PCREL_S2 (PR #100105)
llvmbot wrote: @llvm/pr-subscribers-lld-elf Author: wanglei (wangleiat) Changes --- Full diff: https://github.com/llvm/llvm-project/pull/100105.diff 5 Files Affected: - (modified) lld/ELF/Arch/LoongArch.cpp (+10) - (modified) lld/ELF/Relocations.cpp (+2-1) - (added) lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s (+129) - (added) lld/test/ELF/loongarch-tls-ld-pcrel20-s2.s (+82) - (added) lld/test/ELF/loongarch-tlsdesc-pcrel20-s2.s (+142) ``diff diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp index 9466e8b1ce54d..db0bc6c760096 100644 --- a/lld/ELF/Arch/LoongArch.cpp +++ b/lld/ELF/Arch/LoongArch.cpp @@ -511,6 +511,12 @@ RelExpr LoongArch::getRelExpr(const RelType type, const Symbol &s, return R_TLSDESC; case R_LARCH_TLS_DESC_CALL: return R_TLSDESC_CALL; + case R_LARCH_TLS_LD_PCREL20_S2: +return R_TLSLD_PC; + case R_LARCH_TLS_GD_PCREL20_S2: +return R_TLSGD_PC; + case R_LARCH_TLS_DESC_PCREL20_S2: +return R_TLSDESC_PC; // Other known relocs that are explicitly unimplemented: // @@ -557,7 +563,11 @@ void LoongArch::relocate(uint8_t *loc, const Relocation &rel, write64le(loc, val); return; + // Relocs intended for `pcaddi`. case R_LARCH_PCREL20_S2: + case R_LARCH_TLS_LD_PCREL20_S2: + case R_LARCH_TLS_GD_PCREL20_S2: + case R_LARCH_TLS_DESC_PCREL20_S2: checkInt(loc, val, 22, rel); checkAlignment(loc, val, 4, rel); write32le(loc, setJ20(read32le(loc), val >> 2)); diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp index 36857d72c647e..6ad5c3bf8f6e9 100644 --- a/lld/ELF/Relocations.cpp +++ b/lld/ELF/Relocations.cpp @@ -1308,7 +1308,8 @@ static unsigned handleTlsRelocation(RelType type, Symbol &sym, // LoongArch does not yet implement transition from TLSDESC to LE/IE, so // generate TLSDESC dynamic relocation for the dynamic linker to handle. if (config->emachine == EM_LOONGARCH && - oneof(expr)) { + oneof(expr)) { if (expr != R_TLSDESC_CALL) { sym.setFlags(NEEDS_TLSDESC); c.addReloc({expr, type, offset, addend, &sym}); diff --git a/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s b/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s new file mode 100644 index 0..d4d12b9d4a520 --- /dev/null +++ b/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s @@ -0,0 +1,129 @@ +# REQUIRES: loongarch +# RUN: rm -rf %t && split-file %s %t + +# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/a.s -o %t/a.32.o +# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/bc.s -o %t/bc.32.o +# RUN: ld.lld -shared -soname=bc.so %t/bc.32.o -o %t/bc.32.so +# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/tga.s -o %t/tga.32.o +# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/a.s -o %t/a.64.o +# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/bc.s -o %t/bc.64.o +# RUN: ld.lld -shared -soname=bc.so %t/bc.64.o -o %t/bc.64.so +# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/tga.s -o %t/tga.64.o + +## LA32 GD +# RUN: ld.lld -shared %t/a.32.o %t/bc.32.o -o %t/gd.32.so +# RUN: llvm-readobj -r %t/gd.32.so | FileCheck --check-prefix=GD32-REL %s +# RUN: llvm-objdump -d --no-show-raw-insn %t/gd.32.so | FileCheck --check-prefix=GD32 %s + +## LA32 GD -> LE +# RUN: ld.lld %t/a.32.o %t/bc.32.o %t/tga.32.o -o %t/le.32 +# RUN: llvm-readelf -r %t/le.32 | FileCheck --check-prefix=NOREL %s +# RUN: llvm-readelf -x .got %t/le.32 | FileCheck --check-prefix=LE32-GOT %s +# RUN: ld.lld -pie %t/a.32.o %t/bc.32.o %t/tga.32.o -o %t/le-pie.32 +# RUN: llvm-readelf -r %t/le-pie.32 | FileCheck --check-prefix=NOREL %s +# RUN: llvm-readelf -x .got %t/le-pie.32 | FileCheck --check-prefix=LE32-GOT %s + +## LA32 GD -> IE +# RUN: ld.lld %t/a.32.o %t/bc.32.so %t/tga.32.o -o %t/ie.32 +# RUN: llvm-readobj -r %t/ie.32 | FileCheck --check-prefix=IE32-REL %s +# RUN: llvm-readelf -x .got %t/ie.32 | FileCheck --check-prefix=IE32-GOT %s + +## LA64 GD +# RUN: ld.lld -shared %t/a.64.o %t/bc.64.o -o %t/gd.64.so +# RUN: llvm-readobj -r %t/gd.64.so | FileCheck --check-prefix=GD64-REL %s +# RUN: llvm-objdump -d --no-show-raw-insn %t/gd.64.so | FileCheck --check-prefix=GD64 %s + +## LA64 GD -> LE +# RUN: ld.lld %t/a.64.o %t/bc.64.o %t/tga.64.o -o %t/le.64 +# RUN: llvm-readelf -r %t/le.64 | FileCheck --check-prefix=NOREL %s +# RUN: llvm-readelf -x .got %t/le.64 | FileCheck --check-prefix=LE64-GOT %s +# RUN: ld.lld -pie %t/a.64.o %t/bc.64.o %t/tga.64.o -o %t/le-pie.64 +# RUN: llvm-readelf -r %t/le-pie.64 | FileCheck --check-prefix=NOREL %s +# RUN: llvm-readelf -x .got %t/le-pie.64 | FileCheck --check-prefix=LE64-GOT %s + +## LA64 GD -> IE +# RUN: ld.lld %t/a.64.o %t/bc.64.so %t/tga.64.o -o %t/ie.64 +# RUN: llvm-readobj -r %t/ie.64 | FileCheck --check-prefix=IE64-REL %s +# RUN: llvm-readelf -x .got %t/ie.64 | FileCheck --check-prefix=IE64-GOT %s + +# GD32-REL: .rela.dyn { +# GD32-REL-NEXT: 0x20300 R_LARCH_TLS_DTPMOD32 a 0x0 +# GD32-REL-NEXT: 0x20304 R_LARCH_TLS_DTPREL32 a 0x0 +# GD32-REL-NEXT: 0x20308 R_LARCH_TLS_
[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)
https://github.com/nikic approved this pull request. https://github.com/llvm/llvm-project/pull/100094 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
@@ -266,6 +287,47 @@ class StaleMatcher { } return BestBlock; } + // Uses pseudo probe information to attach the profile to the appropriate + // block. + const FlowBlock *matchWithPseudoProbes( + const std::vector &PseudoProbes) const { +// Searches for the pseudo probe attached to the matched function's block, +// ignoring pseudo probes attached to function calls and inlined functions' +// blocks. +std::vector BlockPseudoProbes; +for (const auto &PseudoProbe : PseudoProbes) { + // Ensures that pseudo probe information belongs to the appropriate + // function and not an inlined function. + if (PseudoProbe.GUID != YamlBFGUID) +continue; + // Skips pseudo probes attached to function calls. + if (PseudoProbe.Type != static_cast(PseudoProbeType::Block)) +continue; + + BlockPseudoProbes.push_back(&PseudoProbe); +} + +// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo +// probe and binary pseudo probe. +if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1) + return nullptr; + +uint64_t Index = BlockPseudoProbes[0]->Index; +assert(Index <= Blocks.size() && "Invalid pseudo probe index"); + +auto It = IndexToBinaryPseudoProbes.find(Index); +assert(It != IndexToBinaryPseudoProbes.end() && + "All blocks should have a pseudo probe"); aaupov wrote: This assert should become a check as it's possible to have blocks without probes https://github.com/llvm/llvm-project/pull/99891 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)
https://github.com/ldionne edited https://github.com/llvm/llvm-project/pull/99375 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)
https://github.com/ldionne approved this pull request. LGTM. Let's cherry-pick. https://github.com/llvm/llvm-project/pull/99375 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/tru updated https://github.com/llvm/llvm-project/pull/100097 >From a72a0bf44a8b259be3c62e79082d2fdc04fc2771 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Tue, 23 Jul 2024 11:15:26 +0100 Subject: [PATCH 1/2] [LV] Disable VPlan-based cost model for 19.x release. As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. --- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 2 +- .../test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll | 2 -- 2 files changed, 1 insertion(+), 3 deletions(-) diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6d28b8fabe42e..68363abdb817a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold( cl::desc("The maximum allowed number of runtime memory checks")); static cl::opt UseLegacyCostModel( -"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden, +"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden, cl::desc("Use the legacy cost model instead of the VPlan-based cost model. " "This option will be removed in the future.")); diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll index fc310f4163082..1a78eaf644723 100644 --- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll @@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF @@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF >From 835a2491de62ee09588bfb61ee31600449881675 Mon Sep 17 00:00:00 2001 From: Florian Hahn Date: Tue, 23 Jul 2024 15:39:35 +0100 Subject: [PATCH 2/2] !fixup update test for new default. --- .../Inputs/x86-loopvectorize-costmodel.ll.expected | 1 - 1 file changed, 1 deletion(-) diff --git a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected index 5aa270e76f4c8..e862bf87d265c 100644 --- a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected +++ b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected @@ -17,7 +17,6 @@ define void @test() { ; CHECK: LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = load float, ptr %in0, align 4 ; CHECK: LV: Found an estimated cost of 22 for VF 32 For instruction: %v0 = load float, ptr %in0, align 4 ; CHECK: LV: Found an estimated cost of 92 for VF 64 For instruction: %v0 = load float, ptr %in0, align 4 -; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load float, ptr %in0, align 4 ; entry: br label %for.body ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] 183e8ec - [LV] Disable VPlan-based cost model for 19.x release.
Author: Florian Hahn Date: 2024-07-23T17:02:03+02:00 New Revision: 183e8ecc97a996c24e920e7e9668bc65a0d19439 URL: https://github.com/llvm/llvm-project/commit/183e8ecc97a996c24e920e7e9668bc65a0d19439 DIFF: https://github.com/llvm/llvm-project/commit/183e8ecc97a996c24e920e7e9668bc65a0d19439.diff LOG: [LV] Disable VPlan-based cost model for 19.x release. As discussed in https://github.com/llvm/llvm-project/pull/92555 flip the default for the option added in https://github.com/llvm/llvm-project/pull/99536 to true. This restores the original behavior for the release branch to give the VPlan-based cost model more time to mature on main. Added: Modified: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected Removed: diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp index 6d28b8fabe42e..68363abdb817a 100644 --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp @@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold( cl::desc("The maximum allowed number of runtime memory checks")); static cl::opt UseLegacyCostModel( -"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden, +"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden, cl::desc("Use the legacy cost model instead of the VPlan-based cost model. " "This option will be removed in the future.")); diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll index fc310f4163082..1a78eaf644723 100644 --- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll +++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll @@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF @@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef writeonly %A, ptr nocaptur ; CHECK-NEXT: LV: Interleaving is not beneficial. ; CHECK-NEXT: LV: Found a vectorizable loop (vscale x 4) in ; CHECK-NEXT: LEV: Epilogue vectorization is not profitable for this loop -; CHECK-NEXT: VF picked by VPlan cost model: vscale x 4 ; CHECK-NEXT: Executing best plan with VF=vscale x 4, UF=1 ; CHECK-NEXT: VPlan 'Final VPlan for VF={vscale x 4},UF>=1' { ; CHECK-NEXT: Live-in vp<%0> = VF * UF diff --git a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected index 5aa270e76f4c8..e862bf87d265c 100644 --- a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected +++ b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected @@ -17,7 +17,6 @@ define void @test() { ; CHECK: LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = load float, ptr %in0, align 4 ; CHECK: LV: Found an estimated cost of 22 for VF 32 For instruction: %v0 = load float, ptr %in0, align 4 ; CHECK: LV: Found an estimated cost of 92 for VF 64 For instruction: %v0 = load float, ptr %in0, align 4 -; CHECK: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load float, ptr %in0, align 4 ; entry: br label %for.body ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
https://github.com/tru closed https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)
tru wrote: Merged manually as 183e8ecc97a996c24e920e7e9668bc65a0d19439 since I messed it up with a merge commit instead of a rebase. Sorry, learning the new flow. https://github.com/llvm/llvm-project/pull/100097 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100141 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)
llvmbot wrote: @ldionne What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100141 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100141 Backport 9628777 Requested by: @ldionne >From f281cb2886edb46067606b62163e0c3d6cdfd965 Mon Sep 17 00:00:00 2001 From: PaulXiCao Date: Tue, 23 Jul 2024 15:11:44 + Subject: [PATCH] [libc++][math] Fix undue overflowing of `std::hypot(x,y,z)` (#93350) The 3-dimentionsional `std::hypot(x,y,z)` was sub-optimally implemented. This lead to possible over-/underflows in (intermediate) results which can be circumvented by this proposed change. The idea is to to scale the arguments (see linked issue for full discussion). Tests have been added for problematic over- and underflows. Closes #92782 (cherry picked from commit 9628777479a970db5d0c2d0b456dac6633864760) --- libcxx/include/__math/hypot.h | 89 ++ libcxx/include/cmath | 25 + .../test/libcxx/transitive_includes/cxx17.csv | 3 + .../test/libcxx/transitive_includes/cxx20.csv | 3 + .../test/libcxx/transitive_includes/cxx23.csv | 3 + .../test/libcxx/transitive_includes/cxx26.csv | 3 + .../test/std/numerics/c.math/cmath.pass.cpp | 91 +++ libcxx/test/support/fp_compare.h | 45 - 8 files changed, 197 insertions(+), 65 deletions(-) diff --git a/libcxx/include/__math/hypot.h b/libcxx/include/__math/hypot.h index 1bf193a9ab7ee..61fd260c59409 100644 --- a/libcxx/include/__math/hypot.h +++ b/libcxx/include/__math/hypot.h @@ -15,10 +15,21 @@ #include <__type_traits/is_same.h> #include <__type_traits/promote.h> +#if _LIBCPP_STD_VER >= 17 +# include <__algorithm/max.h> +# include <__math/abs.h> +# include <__math/roots.h> +# include <__utility/pair.h> +# include +#endif + #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) # pragma GCC system_header #endif +_LIBCPP_PUSH_MACROS +#include <__undef_macros> + _LIBCPP_BEGIN_NAMESPACE_STD namespace __math { @@ -41,8 +52,86 @@ inline _LIBCPP_HIDE_FROM_ABI typename __promote<_A1, _A2>::type hypot(_A1 __x, _ return __math::hypot((__result_type)__x, (__result_type)__y); } +#if _LIBCPP_STD_VER >= 17 +// Factors needed to determine if over-/underflow might happen for `std::hypot(x,y,z)`. +// returns [overflow_threshold, overflow_scale] +template +_LIBCPP_HIDE_FROM_ABI std::pair<_Real, _Real> __hypot_factors() { + static_assert(std::numeric_limits<_Real>::is_iec559); + + if constexpr (std::is_same_v<_Real, float>) { +static_assert(-125 == std::numeric_limits<_Real>::min_exponent); +static_assert(+128 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+62f, 0x1.0p-70f}; + } else if constexpr (std::is_same_v<_Real, double>) { +static_assert(-1021 == std::numeric_limits<_Real>::min_exponent); +static_assert(+1024 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+510, 0x1.0p-600}; + } else { // long double +static_assert(std::is_same_v<_Real, long double>); + +// preprocessor guard necessary, otherwise literals (e.g. `0x1.0p+8'190l`) throw warnings even when shielded by `if +// constexpr` +# if __DBL_MAX_EXP__ == __LDBL_MAX_EXP__ +static_assert(sizeof(_Real) == sizeof(double)); +return static_cast>(__math::__hypot_factors()); +# else +static_assert(sizeof(_Real) > sizeof(double)); +static_assert(-16381 == std::numeric_limits<_Real>::min_exponent); +static_assert(+16384 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+8190l, 0x1.0p-9000l}; +# endif + } +} + +// Computes the three-dimensional hypotenuse: `std::hypot(x,y,z)`. +// The naive implementation might over-/underflow which is why this implementation is more involved: +//If the square of an argument might run into issues, we scale the arguments appropriately. +// See https://github.com/llvm/llvm-project/issues/92782 for a detailed discussion and summary. +template +_LIBCPP_HIDE_FROM_ABI _Real __hypot(_Real __x, _Real __y, _Real __z) { + const _Real __max_abs = std::max(__math::fabs(__x), std::max(__math::fabs(__y), __math::fabs(__z))); + const auto [__overflow_threshold, __overflow_scale] = __math::__hypot_factors<_Real>(); + _Real __scale; + if (__max_abs > __overflow_threshold) { // x*x + y*y + z*z might overflow +__scale = __overflow_scale; +__x *= __scale; +__y *= __scale; +__z *= __scale; + } else if (__max_abs < 1 / __overflow_threshold) { // x*x + y*y + z*z might underflow +__scale = 1 / __overflow_scale; +__x *= __scale; +__y *= __scale; +__z *= __scale; + } else +__scale = 1; + return __math::sqrt(__x * __x + __y * __y + __z * __z) / __scale; +} + +inline _LIBCPP_HIDE_FROM_ABI float hypot(float __x, float __y, float __z) { return __math::__hypot(__x, __y, __z); } + +inline _LIBCPP_HIDE_FROM_ABI double hypot(double __x, double __y, double __z) { return __math::__hypot(__x, __y, __z); } + +inline _LIBCPP_HIDE_FROM_ABI long double hypot(long double __x, long double __y, long double __z) { +
[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: None (llvmbot) Changes Backport 9628777 Requested by: @ldionne --- Full diff: https://github.com/llvm/llvm-project/pull/100141.diff 8 Files Affected: - (modified) libcxx/include/__math/hypot.h (+89) - (modified) libcxx/include/cmath (+1-24) - (modified) libcxx/test/libcxx/transitive_includes/cxx17.csv (+3) - (modified) libcxx/test/libcxx/transitive_includes/cxx20.csv (+3) - (modified) libcxx/test/libcxx/transitive_includes/cxx23.csv (+3) - (modified) libcxx/test/libcxx/transitive_includes/cxx26.csv (+3) - (modified) libcxx/test/std/numerics/c.math/cmath.pass.cpp (+75-16) - (modified) libcxx/test/support/fp_compare.h (+20-25) ``diff diff --git a/libcxx/include/__math/hypot.h b/libcxx/include/__math/hypot.h index 1bf193a9ab7ee..61fd260c59409 100644 --- a/libcxx/include/__math/hypot.h +++ b/libcxx/include/__math/hypot.h @@ -15,10 +15,21 @@ #include <__type_traits/is_same.h> #include <__type_traits/promote.h> +#if _LIBCPP_STD_VER >= 17 +# include <__algorithm/max.h> +# include <__math/abs.h> +# include <__math/roots.h> +# include <__utility/pair.h> +# include +#endif + #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) # pragma GCC system_header #endif +_LIBCPP_PUSH_MACROS +#include <__undef_macros> + _LIBCPP_BEGIN_NAMESPACE_STD namespace __math { @@ -41,8 +52,86 @@ inline _LIBCPP_HIDE_FROM_ABI typename __promote<_A1, _A2>::type hypot(_A1 __x, _ return __math::hypot((__result_type)__x, (__result_type)__y); } +#if _LIBCPP_STD_VER >= 17 +// Factors needed to determine if over-/underflow might happen for `std::hypot(x,y,z)`. +// returns [overflow_threshold, overflow_scale] +template +_LIBCPP_HIDE_FROM_ABI std::pair<_Real, _Real> __hypot_factors() { + static_assert(std::numeric_limits<_Real>::is_iec559); + + if constexpr (std::is_same_v<_Real, float>) { +static_assert(-125 == std::numeric_limits<_Real>::min_exponent); +static_assert(+128 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+62f, 0x1.0p-70f}; + } else if constexpr (std::is_same_v<_Real, double>) { +static_assert(-1021 == std::numeric_limits<_Real>::min_exponent); +static_assert(+1024 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+510, 0x1.0p-600}; + } else { // long double +static_assert(std::is_same_v<_Real, long double>); + +// preprocessor guard necessary, otherwise literals (e.g. `0x1.0p+8'190l`) throw warnings even when shielded by `if +// constexpr` +# if __DBL_MAX_EXP__ == __LDBL_MAX_EXP__ +static_assert(sizeof(_Real) == sizeof(double)); +return static_cast>(__math::__hypot_factors()); +# else +static_assert(sizeof(_Real) > sizeof(double)); +static_assert(-16381 == std::numeric_limits<_Real>::min_exponent); +static_assert(+16384 == std::numeric_limits<_Real>::max_exponent); +return {0x1.0p+8190l, 0x1.0p-9000l}; +# endif + } +} + +// Computes the three-dimensional hypotenuse: `std::hypot(x,y,z)`. +// The naive implementation might over-/underflow which is why this implementation is more involved: +//If the square of an argument might run into issues, we scale the arguments appropriately. +// See https://github.com/llvm/llvm-project/issues/92782 for a detailed discussion and summary. +template +_LIBCPP_HIDE_FROM_ABI _Real __hypot(_Real __x, _Real __y, _Real __z) { + const _Real __max_abs = std::max(__math::fabs(__x), std::max(__math::fabs(__y), __math::fabs(__z))); + const auto [__overflow_threshold, __overflow_scale] = __math::__hypot_factors<_Real>(); + _Real __scale; + if (__max_abs > __overflow_threshold) { // x*x + y*y + z*z might overflow +__scale = __overflow_scale; +__x *= __scale; +__y *= __scale; +__z *= __scale; + } else if (__max_abs < 1 / __overflow_threshold) { // x*x + y*y + z*z might underflow +__scale = 1 / __overflow_scale; +__x *= __scale; +__y *= __scale; +__z *= __scale; + } else +__scale = 1; + return __math::sqrt(__x * __x + __y * __y + __z * __z) / __scale; +} + +inline _LIBCPP_HIDE_FROM_ABI float hypot(float __x, float __y, float __z) { return __math::__hypot(__x, __y, __z); } + +inline _LIBCPP_HIDE_FROM_ABI double hypot(double __x, double __y, double __z) { return __math::__hypot(__x, __y, __z); } + +inline _LIBCPP_HIDE_FROM_ABI long double hypot(long double __x, long double __y, long double __z) { + return __math::__hypot(__x, __y, __z); +} + +template && is_arithmetic_v<_A2> && is_arithmetic_v<_A3>, int> = 0 > +_LIBCPP_HIDE_FROM_ABI typename __promote<_A1, _A2, _A3>::type hypot(_A1 __x, _A2 __y, _A3 __z) _NOEXCEPT { + using __result_type = typename __promote<_A1, _A2, _A3>::type; + static_assert(!( + std::is_same_v<_A1, __result_type> && std::is_same_v<_A2, __result_type> && std::is_same_v<_A3, __result_type>)); + return __math::__hypot( + static_cast<__result_type>(__x), static_cast<__result_type>(__y), static_cast<__result_type>(__z)); +} +
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96872 >From ef284fddade0ad779fbbd4bad48a4d63667d3d65 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:58:44 +0200 Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} Need to emit syncscope and new metadata to get the native instruction, most of the time. --- clang/lib/CodeGen/CGBuiltin.cpp | 39 +-- .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl| 2 +- .../builtins-fp-atomics-gfx12.cl | 4 +- .../builtins-fp-atomics-gfx90a.cl | 4 +- .../builtins-fp-atomics-gfx940.cl | 4 +- 5 files changed, 34 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 5639239359ab8..0fb45f0288d46 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -58,6 +58,7 @@ #include "llvm/IR/MDBuilder.h" #include "llvm/IR/MatrixBuilder.h" #include "llvm/IR/MemoryModelRelaxationAnnotations.h" +#include "llvm/Support/AMDGPUAddrSpace.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/ScopedPrinter.h" @@ -18743,8 +18744,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: @@ -18756,18 +18755,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - ArgTy = llvm::Type::getFloatTy(getLLVMContext()); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: ArgTy = llvm::FixedVectorType::get( llvm::Type::getHalfTy(getLLVMContext()), 2); IID = Intrinsic::amdgcn_global_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -19190,7 +19182,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_ds_faddf: case AMDGPU::BI__builtin_amdgcn_ds_fminf: - case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: { + case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19206,6 +19200,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: @@ -19240,8 +19236,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)), EmitScalarExpr(E->getArg(3)), AO, SSID); } else { - // The ds_atomic_fadd_* builtins do not have syncscope/order arguments. - SSID = llvm::SyncScope::System; + // Most of the builtins do not have syncscope/order arguments. For DS + // atomics the scope doesn't really matter, as they implicitly operate at + // workgroup scope. + // + // The global/flat cases need to use agent scope to consistently produce + // the native instruction instead of a cmpxchg expansion. + SSID = getLLVMContext().getOrInsertSyncScopeID("agent"); AO = AtomicOrdering::SequentiallyConsistent; // The v2bf16 builtin uses i16 instead of a natural bfloat type. @@ -19256,6 +19257,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID); if (Volatile) RMW->setVolatile(true); + +unsigned AddrSpace = Ptr.getType()->getAddressSpace(); +if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) { + // Most targets require "amdgpu.no.fine.grained.memory" to emit the nativ
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96873 >From c4cc064cad9a5921b52e00b5a19ca834f5262772 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:12:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 20 ++- .../builtins-fp-atomics-gfx12.cl | 9 ++--- .../builtins-fp-atomics-gfx90a.cl | 2 +- .../builtins-fp-atomics-gfx940.cl | 3 ++- 4 files changed, 15 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 77dadeb1f22fa..baf68c7e81569 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18744,22 +18744,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: { Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -18779,11 +18772,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ArgTy = llvm::Type::getFloatTy(getLLVMContext()); IID = Intrinsic::amdgcn_flat_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_flat_atomic_fadd; - break; } llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); llvm::Value *Val = EmitScalarExpr(E->getArg(1)); @@ -19184,7 +19172,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_fminf: case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19202,6 +19192,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl index 6b8a6d14575db..07e63a8711c7f 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl @@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) { } // CHECK-LABEL: test_flat_add_2f16 -// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_flat_add_2f16 // GFX12: flat_atomic_pk_add_f16 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) { @@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) { } // CHECK-LABEL: test_global_add_half2 -// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr addrspace(1) %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_global_add_half2 // GFX12: global_atomic_pk_add_f16 v2, v[0:1], v2, off
[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100145 Backport c2e4386 Requested by: @ldionne >From 8325799d41659bd1ff72ed3d628732f2a72a5dc8 Mon Sep 17 00:00:00 2001 From: Mark de Wever Date: Tue, 23 Jul 2024 18:03:28 +0200 Subject: [PATCH] [libc++][vector] Tests shrink_to_fit requirement. (#98009) `vector`'s shrink_to_fit implementation is using the "swap-to-free-container-resources-trick" which only shrinks when the input vector is empty. Since the request to shrink_to_fit is non-binding, this is a valid implementation. It is not a high-quality implementation. Since `vector` is not a very popular container the implementation has not been changed and only a test to validate the non-growing property has been added. This was discovered while investigating #95161. (cherry picked from commit c2e438675754b83c31d7d5ba40cb13fe77e795de) --- .../vector.bool/shrink_to_fit.pass.cpp| 45 ++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp index b39245cab7bf4..f8bcee31964bb 100644 --- a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp +++ b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp @@ -39,11 +39,54 @@ TEST_CONSTEXPR_CXX20 bool tests() return true; } +#if TEST_STD_VER >= 23 +template +struct increasing_allocator { + using value_type = T; + std::size_t min_elements = 1000; + increasing_allocator() = default; + + template + constexpr increasing_allocator(const increasing_allocator& other) noexcept : min_elements(other.min_elements) {} + + constexpr std::allocation_result allocate_at_least(std::size_t n) { +if (n < min_elements) + n = min_elements; +min_elements += 1000; +return std::allocator{}.allocate_at_least(n); + } + constexpr T* allocate(std::size_t n) { return allocate_at_least(n).ptr; } + constexpr void deallocate(T* p, std::size_t n) noexcept { std::allocator{}.deallocate(p, n); } +}; + +template +bool operator==(increasing_allocator, increasing_allocator) { + return true; +} + +// https://github.com/llvm/llvm-project/issues/95161 +constexpr bool test_increasing_allocator() { + std::vector> v; + v.push_back(1); + std::size_t capacity = v.capacity(); + v.shrink_to_fit(); + assert(v.capacity() <= capacity); + assert(v.size() == 1); + + return true; +} +#endif // TEST_STD_VER >= 23 + int main(int, char**) { -tests(); + tests(); #if TEST_STD_VER > 17 static_assert(tests()); #endif +#if TEST_STD_VER >= 23 +test_increasing_allocator(); +static_assert(test_increasing_allocator()); +#endif // TEST_STD_VER >= 23 + return 0; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)
llvmbot wrote: @ldionne What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: None (llvmbot) Changes Backport c2e4386 Requested by: @ldionne --- Full diff: https://github.com/llvm/llvm-project/pull/100145.diff 1 Files Affected: - (modified) libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp (+44-1) ``diff diff --git a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp index b39245cab7bf4..f8bcee31964bb 100644 --- a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp +++ b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp @@ -39,11 +39,54 @@ TEST_CONSTEXPR_CXX20 bool tests() return true; } +#if TEST_STD_VER >= 23 +template +struct increasing_allocator { + using value_type = T; + std::size_t min_elements = 1000; + increasing_allocator() = default; + + template + constexpr increasing_allocator(const increasing_allocator& other) noexcept : min_elements(other.min_elements) {} + + constexpr std::allocation_result allocate_at_least(std::size_t n) { +if (n < min_elements) + n = min_elements; +min_elements += 1000; +return std::allocator{}.allocate_at_least(n); + } + constexpr T* allocate(std::size_t n) { return allocate_at_least(n).ptr; } + constexpr void deallocate(T* p, std::size_t n) noexcept { std::allocator{}.deallocate(p, n); } +}; + +template +bool operator==(increasing_allocator, increasing_allocator) { + return true; +} + +// https://github.com/llvm/llvm-project/issues/95161 +constexpr bool test_increasing_allocator() { + std::vector> v; + v.push_back(1); + std::size_t capacity = v.capacity(); + v.shrink_to_fit(); + assert(v.capacity() <= capacity); + assert(v.size() == 1); + + return true; +} +#endif // TEST_STD_VER >= 23 + int main(int, char**) { -tests(); + tests(); #if TEST_STD_VER > 17 static_assert(tests()); #endif +#if TEST_STD_VER >= 23 +test_increasing_allocator(); +static_assert(test_increasing_allocator()); +#endif // TEST_STD_VER >= 23 + return 0; } `` https://github.com/llvm/llvm-project/pull/100145 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100149 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100149 Backport d0ca9f2 Requested by: @ldionne >From c264db19ef5ac5de02596ebb8ff3774394c871b4 Mon Sep 17 00:00:00 2001 From: Mark de Wever Date: Tue, 23 Jul 2024 18:13:22 +0200 Subject: [PATCH] [libc++][string] Fixes shrink_to_fit. (#97961) This ensures that shrink_to_fit does not increase the allocated size. Partly addresses #95161 (cherry picked from commit d0ca9f23e8f25b0509c3ff34ed215508b39ea6e7) --- libcxx/include/string | 17 ++-- .../string.capacity/shrink_to_fit.pass.cpp| 41 +++ 2 files changed, 55 insertions(+), 3 deletions(-) diff --git a/libcxx/include/string b/libcxx/include/string index ba86a32090825..9fa979e3a5178 100644 --- a/libcxx/include/string +++ b/libcxx/include/string @@ -3358,23 +3358,34 @@ basic_string<_CharT, _Traits, _Allocator>::__shrink_or_extend(size_type __target __p= __get_long_pointer(); } else { if (__target_capacity > __cap) { + // Extend + // - called from reserve should propagate the exception thrown. auto __allocation = std::__allocate_at_least(__alloc(), __target_capacity + 1); __new_data= __allocation.ptr; __target_capacity = __allocation.count - 1; } else { + // Shrink + // - called from shrink_to_fit should not throw. + // - called from reserve may throw but is not required to. #ifndef _LIBCPP_HAS_NO_EXCEPTIONS try { #endif // _LIBCPP_HAS_NO_EXCEPTIONS auto __allocation = std::__allocate_at_least(__alloc(), __target_capacity + 1); + +// The Standard mandates shrink_to_fit() does not increase the capacity. +// With equal capacity keep the existing buffer. This avoids extra work +// due to swapping the elements. +if (__allocation.count - 1 > __target_capacity) { + __alloc_traits::deallocate(__alloc(), __allocation.ptr, __allocation.count); + __annotate_new(__sz); // Undoes the __annotate_delete() + return; +} __new_data= __allocation.ptr; __target_capacity = __allocation.count - 1; #ifndef _LIBCPP_HAS_NO_EXCEPTIONS } catch (...) { return; } -#else // _LIBCPP_HAS_NO_EXCEPTIONS - if (__new_data == nullptr) -return; #endif // _LIBCPP_HAS_NO_EXCEPTIONS } __begin_lifetime(__new_data, __target_capacity + 1); diff --git a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp index 057050cdcf7fa..6f5e43d1341f5 100644 --- a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp +++ b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp @@ -63,8 +63,49 @@ TEST_CONSTEXPR_CXX20 bool test() { return true; } +#if TEST_STD_VER >= 23 +std::size_t min_bytes = 1000; + +template +struct increasing_allocator { + using value_type = T; + increasing_allocator() = default; + template + increasing_allocator(const increasing_allocator&) noexcept {} + std::allocation_result allocate_at_least(std::size_t n) { +std::size_t allocation_amount = n * sizeof(T); +if (allocation_amount < min_bytes) + allocation_amount = min_bytes; +min_bytes += 1000; +return {static_cast(::operator new(allocation_amount)), allocation_amount / sizeof(T)}; + } + T* allocate(std::size_t n) { return allocate_at_least(n).ptr; } + void deallocate(T* p, std::size_t) noexcept { ::operator delete(static_cast(p)); } +}; + +template +bool operator==(increasing_allocator, increasing_allocator) { + return true; +} + +// https://github.com/llvm/llvm-project/issues/95161 +void test_increasing_allocator() { + std::basic_string, increasing_allocator> s{ + "String does not fit in the internal buffer"}; + std::size_t capacity = s.capacity(); + std::size_t size = s.size(); + s.shrink_to_fit(); + assert(s.capacity() <= capacity); + assert(s.size() == size); + LIBCPP_ASSERT(is_string_asan_correct(s)); +} +#endif // TEST_STD_VER >= 23 + int main(int, char**) { test(); +#if TEST_STD_VER >= 23 + test_increasing_allocator(); +#endif #if TEST_STD_VER > 17 static_assert(test()); #endif ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)
llvmbot wrote: @ldionne What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100149 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)
llvmbot wrote: @llvm/pr-subscribers-libcxx Author: None (llvmbot) Changes Backport d0ca9f2 Requested by: @ldionne --- Full diff: https://github.com/llvm/llvm-project/pull/100149.diff 2 Files Affected: - (modified) libcxx/include/string (+14-3) - (modified) libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp (+41) ``diff diff --git a/libcxx/include/string b/libcxx/include/string index ba86a32090825..9fa979e3a5178 100644 --- a/libcxx/include/string +++ b/libcxx/include/string @@ -3358,23 +3358,34 @@ basic_string<_CharT, _Traits, _Allocator>::__shrink_or_extend(size_type __target __p= __get_long_pointer(); } else { if (__target_capacity > __cap) { + // Extend + // - called from reserve should propagate the exception thrown. auto __allocation = std::__allocate_at_least(__alloc(), __target_capacity + 1); __new_data= __allocation.ptr; __target_capacity = __allocation.count - 1; } else { + // Shrink + // - called from shrink_to_fit should not throw. + // - called from reserve may throw but is not required to. #ifndef _LIBCPP_HAS_NO_EXCEPTIONS try { #endif // _LIBCPP_HAS_NO_EXCEPTIONS auto __allocation = std::__allocate_at_least(__alloc(), __target_capacity + 1); + +// The Standard mandates shrink_to_fit() does not increase the capacity. +// With equal capacity keep the existing buffer. This avoids extra work +// due to swapping the elements. +if (__allocation.count - 1 > __target_capacity) { + __alloc_traits::deallocate(__alloc(), __allocation.ptr, __allocation.count); + __annotate_new(__sz); // Undoes the __annotate_delete() + return; +} __new_data= __allocation.ptr; __target_capacity = __allocation.count - 1; #ifndef _LIBCPP_HAS_NO_EXCEPTIONS } catch (...) { return; } -#else // _LIBCPP_HAS_NO_EXCEPTIONS - if (__new_data == nullptr) -return; #endif // _LIBCPP_HAS_NO_EXCEPTIONS } __begin_lifetime(__new_data, __target_capacity + 1); diff --git a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp index 057050cdcf7fa..6f5e43d1341f5 100644 --- a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp +++ b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp @@ -63,8 +63,49 @@ TEST_CONSTEXPR_CXX20 bool test() { return true; } +#if TEST_STD_VER >= 23 +std::size_t min_bytes = 1000; + +template +struct increasing_allocator { + using value_type = T; + increasing_allocator() = default; + template + increasing_allocator(const increasing_allocator&) noexcept {} + std::allocation_result allocate_at_least(std::size_t n) { +std::size_t allocation_amount = n * sizeof(T); +if (allocation_amount < min_bytes) + allocation_amount = min_bytes; +min_bytes += 1000; +return {static_cast(::operator new(allocation_amount)), allocation_amount / sizeof(T)}; + } + T* allocate(std::size_t n) { return allocate_at_least(n).ptr; } + void deallocate(T* p, std::size_t) noexcept { ::operator delete(static_cast(p)); } +}; + +template +bool operator==(increasing_allocator, increasing_allocator) { + return true; +} + +// https://github.com/llvm/llvm-project/issues/95161 +void test_increasing_allocator() { + std::basic_string, increasing_allocator> s{ + "String does not fit in the internal buffer"}; + std::size_t capacity = s.capacity(); + std::size_t size = s.size(); + s.shrink_to_fit(); + assert(s.capacity() <= capacity); + assert(s.size() == size); + LIBCPP_ASSERT(is_string_asan_correct(s)); +} +#endif // TEST_STD_VER >= 23 + int main(int, char**) { test(); +#if TEST_STD_VER >= 23 + test_increasing_allocator(); +#endif #if TEST_STD_VER > 17 static_assert(test()); #endif `` https://github.com/llvm/llvm-project/pull/100149 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100151 Backport 1df4d86 Requested by: @daltenty >From 79ddb123bdbf8300c49e4b2abc74b664af833ea9 Mon Sep 17 00:00:00 2001 From: azhan92 Date: Tue, 23 Jul 2024 09:49:41 -0400 Subject: [PATCH] [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) This PR adds support for -mcpu=pwr11/power11 and -mtune=pwr11/power11 in clang and llvm. (cherry picked from commit 1df4d866cca51eeab8f012a97cc50957b45971fe) --- clang/lib/Basic/Targets/PPC.cpp | 39 --- clang/lib/Basic/Targets/PPC.h | 19 ++--- clang/lib/Driver/ToolChains/Arch/PPC.cpp | 3 ++ clang/test/Misc/target-invalid-cpu-note.c | 2 +- clang/test/Preprocessor/init-ppc64.c | 22 +++ llvm/lib/Target/PowerPC/PPC.td| 20 -- llvm/lib/Target/PowerPC/PPCISelLowering.cpp | 3 ++ llvm/lib/Target/PowerPC/PPCInstrInfo.cpp | 1 + llvm/lib/Target/PowerPC/PPCSubtarget.h| 1 + .../Target/PowerPC/PPCTargetTransformInfo.cpp | 4 +- llvm/lib/TargetParser/Host.cpp| 7 llvm/test/CodeGen/PowerPC/check-cpu.ll| 6 ++- llvm/test/CodeGen/PowerPC/mma-acc-spill.ll| 7 ...{p10-constants.ll => p10-p11-constants.ll} | 12 +- llvm/unittests/TargetParser/Host.cpp | 1 + 15 files changed, 120 insertions(+), 27 deletions(-) rename llvm/test/CodeGen/PowerPC/{p10-constants.ll => p10-p11-constants.ll} (94%) diff --git a/clang/lib/Basic/Targets/PPC.cpp b/clang/lib/Basic/Targets/PPC.cpp index 4ba4a49311d36..9ff54083c923b 100644 --- a/clang/lib/Basic/Targets/PPC.cpp +++ b/clang/lib/Basic/Targets/PPC.cpp @@ -385,6 +385,8 @@ void PPCTargetInfo::getTargetDefines(const LangOptions &Opts, Builder.defineMacro("_ARCH_PWR9"); if (ArchDefs & ArchDefinePwr10) Builder.defineMacro("_ARCH_PWR10"); + if (ArchDefs & ArchDefinePwr11) +Builder.defineMacro("_ARCH_PWR11"); if (ArchDefs & ArchDefineA2) Builder.defineMacro("_ARCH_A2"); if (ArchDefs & ArchDefineE500) @@ -622,10 +624,17 @@ bool PPCTargetInfo::initFeatureMap( addP10SpecificFeatures(Features); } - // Future CPU should include all of the features of Power 10 as well as any + // Power11 includes all the same features as Power10 plus any features + // specific to the Power11 core. + if (CPU == "pwr11" || CPU == "power11") { +initFeatureMap(Features, Diags, "pwr10", FeaturesVec); +addP11SpecificFeatures(Features); + } + + // Future CPU should include all of the features of Power 11 as well as any // additional features (yet to be determined) specific to it. if (CPU == "future") { -initFeatureMap(Features, Diags, "pwr10", FeaturesVec); +initFeatureMap(Features, Diags, "pwr11", FeaturesVec); addFutureSpecificFeatures(Features); } @@ -696,6 +705,10 @@ void PPCTargetInfo::addP10SpecificFeatures( Features["isa-v31-instructions"] = true; } +// Add any Power11 specific features. +void PPCTargetInfo::addP11SpecificFeatures( +llvm::StringMap &Features) const {} + // Add features specific to the "Future" CPU. void PPCTargetInfo::addFutureSpecificFeatures( llvm::StringMap &Features) const {} @@ -870,17 +883,17 @@ ArrayRef PPCTargetInfo::getGCCAddlRegNames() const { } static constexpr llvm::StringLiteral ValidCPUNames[] = { -{"generic"}, {"440"}, {"450"},{"601"}, {"602"}, -{"603"}, {"603e"},{"603ev"}, {"604"}, {"604e"}, -{"620"}, {"630"}, {"g3"}, {"7400"}, {"g4"}, -{"7450"},{"g4+"}, {"750"},{"8548"}, {"970"}, -{"g5"}, {"a2"}, {"e500"}, {"e500mc"},{"e5500"}, -{"power3"}, {"pwr3"},{"power4"}, {"pwr4"}, {"power5"}, -{"pwr5"},{"power5x"}, {"pwr5x"}, {"power6"},{"pwr6"}, -{"power6x"}, {"pwr6x"}, {"power7"}, {"pwr7"}, {"power8"}, -{"pwr8"},{"power9"}, {"pwr9"}, {"power10"}, {"pwr10"}, -{"powerpc"}, {"ppc"}, {"ppc32"}, {"powerpc64"}, {"ppc64"}, -{"powerpc64le"}, {"ppc64le"}, {"future"}}; +{"generic"}, {"440"}, {"450"}, {"601"}, {"602"}, +{"603"}, {"603e"},{"603ev"}, {"604"}, {"604e"}, +{"620"}, {"630"}, {"g3"}, {"7400"},{"g4"}, +{"7450"}, {"g4+"}, {"750"}, {"8548"},{"970"}, +{"g5"},{"a2"}, {"e500"},{"e500mc"}, {"e5500"}, +{"power3"},{"pwr3"},{"power4"}, {"pwr4"},{"power5"}, +{"pwr5"}, {"power5x"}, {"pwr5x"}, {"power6"}, {"pwr6"}, +{"power6x"}, {"pwr6x"}, {"power7"}, {"pwr7"},{"power8"}, +{"pwr8"}, {"power9"}, {"pwr9"},{"power10"}, {"pwr10"}, +{"power11"}, {"pwr11"}, {"powerpc"}, {"ppc"}, {"ppc32"}, +{"powerpc64"}, {"ppc64"}, {"powerpc64le"}, {"ppc64le"}, {"future"}}; bool PPCTargetInfo::isValidCPUNa
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)
llvmbot wrote: @azhan92 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)
llvmbot wrote: @llvm/pr-subscribers-backend-powerpc @llvm/pr-subscribers-clang-driver Author: None (llvmbot) Changes Backport 1df4d86 Requested by: @daltenty --- Patch is 20.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100151.diff 15 Files Affected: - (modified) clang/lib/Basic/Targets/PPC.cpp (+26-13) - (modified) clang/lib/Basic/Targets/PPC.h (+13-6) - (modified) clang/lib/Driver/ToolChains/Arch/PPC.cpp (+3) - (modified) clang/test/Misc/target-invalid-cpu-note.c (+1-1) - (modified) clang/test/Preprocessor/init-ppc64.c (+22) - (modified) llvm/lib/Target/PowerPC/PPC.td (+17-3) - (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+3) - (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+1) - (modified) llvm/lib/Target/PowerPC/PPCSubtarget.h (+1) - (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2) - (modified) llvm/lib/TargetParser/Host.cpp (+7) - (modified) llvm/test/CodeGen/PowerPC/check-cpu.ll (+5-1) - (modified) llvm/test/CodeGen/PowerPC/mma-acc-spill.ll (+7) - (renamed) llvm/test/CodeGen/PowerPC/p10-p11-constants.ll (+11-1) - (modified) llvm/unittests/TargetParser/Host.cpp (+1) ``diff diff --git a/clang/lib/Basic/Targets/PPC.cpp b/clang/lib/Basic/Targets/PPC.cpp index 4ba4a49311d36..9ff54083c923b 100644 --- a/clang/lib/Basic/Targets/PPC.cpp +++ b/clang/lib/Basic/Targets/PPC.cpp @@ -385,6 +385,8 @@ void PPCTargetInfo::getTargetDefines(const LangOptions &Opts, Builder.defineMacro("_ARCH_PWR9"); if (ArchDefs & ArchDefinePwr10) Builder.defineMacro("_ARCH_PWR10"); + if (ArchDefs & ArchDefinePwr11) +Builder.defineMacro("_ARCH_PWR11"); if (ArchDefs & ArchDefineA2) Builder.defineMacro("_ARCH_A2"); if (ArchDefs & ArchDefineE500) @@ -622,10 +624,17 @@ bool PPCTargetInfo::initFeatureMap( addP10SpecificFeatures(Features); } - // Future CPU should include all of the features of Power 10 as well as any + // Power11 includes all the same features as Power10 plus any features + // specific to the Power11 core. + if (CPU == "pwr11" || CPU == "power11") { +initFeatureMap(Features, Diags, "pwr10", FeaturesVec); +addP11SpecificFeatures(Features); + } + + // Future CPU should include all of the features of Power 11 as well as any // additional features (yet to be determined) specific to it. if (CPU == "future") { -initFeatureMap(Features, Diags, "pwr10", FeaturesVec); +initFeatureMap(Features, Diags, "pwr11", FeaturesVec); addFutureSpecificFeatures(Features); } @@ -696,6 +705,10 @@ void PPCTargetInfo::addP10SpecificFeatures( Features["isa-v31-instructions"] = true; } +// Add any Power11 specific features. +void PPCTargetInfo::addP11SpecificFeatures( +llvm::StringMap &Features) const {} + // Add features specific to the "Future" CPU. void PPCTargetInfo::addFutureSpecificFeatures( llvm::StringMap &Features) const {} @@ -870,17 +883,17 @@ ArrayRef PPCTargetInfo::getGCCAddlRegNames() const { } static constexpr llvm::StringLiteral ValidCPUNames[] = { -{"generic"}, {"440"}, {"450"},{"601"}, {"602"}, -{"603"}, {"603e"},{"603ev"}, {"604"}, {"604e"}, -{"620"}, {"630"}, {"g3"}, {"7400"}, {"g4"}, -{"7450"},{"g4+"}, {"750"},{"8548"}, {"970"}, -{"g5"}, {"a2"}, {"e500"}, {"e500mc"},{"e5500"}, -{"power3"}, {"pwr3"},{"power4"}, {"pwr4"}, {"power5"}, -{"pwr5"},{"power5x"}, {"pwr5x"}, {"power6"},{"pwr6"}, -{"power6x"}, {"pwr6x"}, {"power7"}, {"pwr7"}, {"power8"}, -{"pwr8"},{"power9"}, {"pwr9"}, {"power10"}, {"pwr10"}, -{"powerpc"}, {"ppc"}, {"ppc32"}, {"powerpc64"}, {"ppc64"}, -{"powerpc64le"}, {"ppc64le"}, {"future"}}; +{"generic"}, {"440"}, {"450"}, {"601"}, {"602"}, +{"603"}, {"603e"},{"603ev"}, {"604"}, {"604e"}, +{"620"}, {"630"}, {"g3"}, {"7400"},{"g4"}, +{"7450"}, {"g4+"}, {"750"}, {"8548"},{"970"}, +{"g5"},{"a2"}, {"e500"},{"e500mc"}, {"e5500"}, +{"power3"},{"pwr3"},{"power4"}, {"pwr4"},{"power5"}, +{"pwr5"}, {"power5x"}, {"pwr5x"}, {"power6"}, {"pwr6"}, +{"power6x"}, {"pwr6x"}, {"power7"}, {"pwr7"},{"power8"}, +{"pwr8"}, {"power9"}, {"pwr9"},{"power10"}, {"pwr10"}, +{"power11"}, {"pwr11"}, {"powerpc"}, {"ppc"}, {"ppc32"}, +{"powerpc64"}, {"ppc64"}, {"powerpc64le"}, {"ppc64le"}, {"future"}}; bool PPCTargetInfo::isValidCPUName(StringRef Name) const { return llvm::is_contained(ValidCPUNames, Name); diff --git a/clang/lib/Basic/Targets/PPC.h b/clang/lib/Basic/Targets/PPC.h index b15ab6fbcf492..6d5d8dd54d013 100644 --- a/clang/lib/Basic/Targets/PPC.h +++ b/clang/lib/Basic/Targets/PPC.h @@ -44,8 +44,9 @@ cla
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add omp.target_triples attribute to the OffloadModuleInterface (PR #100154)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/100154 The `OffloadModuleInterface` holds getter/setter methods to access OpenMP dialect module-level discardable attributes used to hold general OpenMP compilation information. This patch adds the `omp.target_triples` attribute, which is intended to hold the list of offloading target triples linked to the host module in which it appears. This attribute should be empty when `omp.is_target_device=true`. >From 3dbb22595bcf691a619483ea51b3620a9de87263 Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 23 Jul 2024 16:32:16 +0100 Subject: [PATCH] [MLIR][OpenMP] Add omp.target_triples attribute to the OffloadModuleInterface The `OffloadModuleInterface` holds getter/setter methods to access OpenMP dialect module-level discardable attributes used to hold general OpenMP compilation information. This patch adds the `omp.target_triples` attribute, which is intended to hold the list of offloading target triples linked to the host module in which it appears. This attribute should be empty when `omp.is_target_device=true`. --- .../Dialect/OpenMP/OpenMPOpsInterfaces.td | 28 +++ 1 file changed, 28 insertions(+) diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td index 385aa8b1b016a..9e62dcd9253d6 100644 --- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td +++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td @@ -351,6 +351,34 @@ def OffloadModuleInterface : OpInterface<"OffloadModuleInterface"> { (ins "::mlir::omp::ClauseRequires":$clauses), [{}], [{ $_op->setAttr(mlir::StringAttr::get($_op->getContext(), "omp.requires"), mlir::omp::ClauseRequiresAttr::get($_op->getContext(), clauses)); + }]>, +InterfaceMethod< + /*description=*/[{ +Get the omp.target_triples attribute on the operator if it's present and +return its value. If it doesn't exist, return an empty array by default. + }], + /*retTy=*/"::llvm::ArrayRef<::mlir::Attribute>", + /*methodName=*/"getTargetTriples", + (ins), [{}], [{ +if (Attribute triplesAttr = $_op->getAttr("omp.target_triples")) + if (auto triples = ::llvm::dyn_cast<::mlir::ArrayAttr>(triplesAttr)) +return triples.getValue(); +return {}; + }]>, +InterfaceMethod< + /*description=*/[{ +Set the omp.target_triples attribute on the operation. + }], + /*retTy=*/"void", + /*methodName=*/"setTargetTriples", + (ins "::llvm::ArrayRef<::std::string>":$targetTriples), [{}], [{ +auto names = ::llvm::to_vector(::llvm::map_range( +targetTriples, [&](::std::string str) -> ::mlir::Attribute { + return mlir::StringAttr::get($_op->getContext(), str); +})); +$_op->setAttr( +::mlir::StringAttr::get($_op->getContext(), "omp.target_triples"), +::mlir::ArrayAttr::get($_op->getContext(), names)); }]> ]; } ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Add frontend support for -fopenmp-targets (PR #100155)
https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/100155 This patch adds support for the `-fopenmp-targets` option to the `bbc` and `flang -fc1` tools. It adds an `OMPTargetTriples` property to the `LangOptions` structure, which is filled with the triples represented by the compiler option. This is used to initialize the `omp.target_triples` module attribute for later use by lowering stages. >From 54e52e8a37fd725976e157cd0f9e0221a355dead Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Tue, 23 Jul 2024 16:40:18 +0100 Subject: [PATCH] [Flang][OpenMP] Add frontend support for -fopenmp-targets This patch adds support for the `-fopenmp-targets` option to the `bbc` and `flang -fc1` tools. It adds an `OMPTargetTriples` property to the `LangOptions` structure, which is filled with the triples represented by the compiler option. This is used to initialize the `omp.target_triples` module attribute for later use by lowering stages. --- flang/include/flang/Frontend/LangOptions.h | 6 flang/include/flang/Tools/CrossToolHelpers.h | 14 ++-- flang/lib/Frontend/CompilerInvocation.cpp| 35 flang/test/Lower/OpenMP/offload-targets.f90 | 10 ++ flang/tools/bbc/bbc.cpp | 13 +++- 5 files changed, 74 insertions(+), 4 deletions(-) create mode 100644 flang/test/Lower/OpenMP/offload-targets.f90 diff --git a/flang/include/flang/Frontend/LangOptions.h b/flang/include/flang/Frontend/LangOptions.h index 7ab2195818863..57d86d46df5ab 100644 --- a/flang/include/flang/Frontend/LangOptions.h +++ b/flang/include/flang/Frontend/LangOptions.h @@ -16,6 +16,9 @@ #define FORTRAN_FRONTEND_LANGOPTIONS_H #include +#include + +#include "llvm/TargetParser/Triple.h" namespace Fortran::frontend { @@ -58,6 +61,9 @@ class LangOptions : public LangOptionsBase { /// host code generation. std::string OMPHostIRFile; + /// List of triples passed in using -fopenmp-targets. + std::vector OMPTargetTriples; + LangOptions(); }; diff --git a/flang/include/flang/Tools/CrossToolHelpers.h b/flang/include/flang/Tools/CrossToolHelpers.h index 1d890fd8e1f6f..75fd783af237d 100644 --- a/flang/include/flang/Tools/CrossToolHelpers.h +++ b/flang/include/flang/Tools/CrossToolHelpers.h @@ -131,7 +131,9 @@ struct OffloadModuleOpts { bool OpenMPThreadSubscription, bool OpenMPNoThreadState, bool OpenMPNoNestedParallelism, bool OpenMPIsTargetDevice, bool OpenMPIsGPU, bool OpenMPForceUSM, uint32_t OpenMPVersion, - std::string OMPHostIRFile = {}, bool NoGPULib = false) + std::string OMPHostIRFile = {}, + const std::vector &OMPTargetTriples = {}, + bool NoGPULib = false) : OpenMPTargetDebug(OpenMPTargetDebug), OpenMPTeamSubscription(OpenMPTeamSubscription), OpenMPThreadSubscription(OpenMPThreadSubscription), @@ -139,7 +141,9 @@ struct OffloadModuleOpts { OpenMPNoNestedParallelism(OpenMPNoNestedParallelism), OpenMPIsTargetDevice(OpenMPIsTargetDevice), OpenMPIsGPU(OpenMPIsGPU), OpenMPForceUSM(OpenMPForceUSM), OpenMPVersion(OpenMPVersion), -OMPHostIRFile(OMPHostIRFile), NoGPULib(NoGPULib) {} +OMPHostIRFile(OMPHostIRFile), +OMPTargetTriples(OMPTargetTriples.begin(), OMPTargetTriples.end()), +NoGPULib(NoGPULib) {} OffloadModuleOpts(Fortran::frontend::LangOptions &Opts) : OpenMPTargetDebug(Opts.OpenMPTargetDebug), @@ -150,7 +154,7 @@ struct OffloadModuleOpts { OpenMPIsTargetDevice(Opts.OpenMPIsTargetDevice), OpenMPIsGPU(Opts.OpenMPIsGPU), OpenMPForceUSM(Opts.OpenMPForceUSM), OpenMPVersion(Opts.OpenMPVersion), OMPHostIRFile(Opts.OMPHostIRFile), -NoGPULib(Opts.NoGPULib) {} +OMPTargetTriples(Opts.OMPTargetTriples), NoGPULib(Opts.NoGPULib) {} uint32_t OpenMPTargetDebug = 0; bool OpenMPTeamSubscription = false; @@ -162,6 +166,7 @@ struct OffloadModuleOpts { bool OpenMPForceUSM = false; uint32_t OpenMPVersion = 11; std::string OMPHostIRFile = {}; + std::vector OMPTargetTriples = {}; bool NoGPULib = false; }; @@ -185,6 +190,9 @@ struct OffloadModuleOpts { if (!Opts.OMPHostIRFile.empty()) offloadMod.setHostIRFilePath(Opts.OMPHostIRFile); } +auto strTriples = llvm::to_vector(llvm::map_range(Opts.OMPTargetTriples, +[](llvm::Triple triple) { return triple.normalize(); })); +offloadMod.setTargetTriples(strTriples); } } diff --git a/flang/lib/Frontend/CompilerInvocation.cpp b/flang/lib/Frontend/CompilerInvocation.cpp index 8c892d9d032e1..19f067a135dd6 100644 --- a/flang/lib/Frontend/CompilerInvocation.cpp +++ b/flang/lib/Frontend/CompilerInvocation.cpp @@ -894,6 +894,7 @@ static bool parseDiagArgs(CompilerInvocation &res, llvm::opt::ArgList &args, /// options accordingly. Returns false if new errors are generated. static bool parseDialectArgs(CompilerInvocation &res, llvm::opt::ArgList &args,
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96872 >From 8e3dfc335301d978d3d22110a6db8f98fc636b4d Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:58:44 +0200 Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} Need to emit syncscope and new metadata to get the native instruction, most of the time. --- clang/lib/CodeGen/CGBuiltin.cpp | 39 +-- .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl| 2 +- .../builtins-fp-atomics-gfx12.cl | 4 +- .../builtins-fp-atomics-gfx90a.cl | 4 +- .../builtins-fp-atomics-gfx940.cl | 4 +- 5 files changed, 34 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index c199976956085..00f581dced900 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -58,6 +58,7 @@ #include "llvm/IR/MDBuilder.h" #include "llvm/IR/MatrixBuilder.h" #include "llvm/IR/MemoryModelRelaxationAnnotations.h" +#include "llvm/Support/AMDGPUAddrSpace.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/ScopedPrinter.h" @@ -18790,8 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: @@ -18803,18 +18802,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - ArgTy = llvm::Type::getFloatTy(getLLVMContext()); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: ArgTy = llvm::FixedVectorType::get( llvm::Type::getHalfTy(getLLVMContext()), 2); IID = Intrinsic::amdgcn_global_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -19237,7 +19229,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_ds_faddf: case AMDGPU::BI__builtin_amdgcn_ds_fminf: - case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: { + case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19253,6 +19247,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: @@ -19287,8 +19283,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)), EmitScalarExpr(E->getArg(3)), AO, SSID); } else { - // The ds_atomic_fadd_* builtins do not have syncscope/order arguments. - SSID = llvm::SyncScope::System; + // Most of the builtins do not have syncscope/order arguments. For DS + // atomics the scope doesn't really matter, as they implicitly operate at + // workgroup scope. + // + // The global/flat cases need to use agent scope to consistently produce + // the native instruction instead of a cmpxchg expansion. + SSID = getLLVMContext().getOrInsertSyncScopeID("agent"); AO = AtomicOrdering::SequentiallyConsistent; // The v2bf16 builtin uses i16 instead of a natural bfloat type. @@ -19303,6 +19304,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID); if (Volatile) RMW->setVolatile(true); + +unsigned AddrSpace = Ptr.getType()->getAddressSpace(); +if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) { + // Most targets require "amdgpu.no.fine.grained.memory" to emit the nativ
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96873 >From ab196e6375bfa6cda5977102d733c501271cb684 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:12:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 20 ++- .../builtins-fp-atomics-gfx12.cl | 9 ++--- .../builtins-fp-atomics-gfx90a.cl | 2 +- .../builtins-fp-atomics-gfx940.cl | 3 ++- 4 files changed, 15 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 010fafde0714e..fec4fc4be562d 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18791,22 +18791,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: { Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -18826,11 +18819,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ArgTy = llvm::Type::getFloatTy(getLLVMContext()); IID = Intrinsic::amdgcn_flat_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_flat_atomic_fadd; - break; } llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); llvm::Value *Val = EmitScalarExpr(E->getArg(1)); @@ -19231,7 +19219,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_fminf: case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19249,6 +19239,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl index 6b8a6d14575db..07e63a8711c7f 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl @@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) { } // CHECK-LABEL: test_flat_add_2f16 -// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_flat_add_2f16 // GFX12: flat_atomic_pk_add_f16 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) { @@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) { } // CHECK-LABEL: test_global_add_half2 -// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr addrspace(1) %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_global_add_half2 // GFX12: global_atomic_pk_add_f16 v2, v[0:1], v2, off
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96873 >From 37f162186d0d30a0c286efb582af86264c576b5c Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:12:59 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 20 ++- .../builtins-fp-atomics-gfx12.cl | 9 ++--- .../builtins-fp-atomics-gfx90a.cl | 2 +- .../builtins-fp-atomics-gfx940.cl | 3 ++- 4 files changed, 15 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 010fafde0714e..fec4fc4be562d 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18791,22 +18791,15 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: { Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -18826,11 +18819,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ArgTy = llvm::Type::getFloatTy(getLLVMContext()); IID = Intrinsic::amdgcn_flat_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: - ArgTy = llvm::FixedVectorType::get( - llvm::Type::getHalfTy(getLLVMContext()), 2); - IID = Intrinsic::amdgcn_flat_atomic_fadd; - break; } llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); llvm::Value *Val = EmitScalarExpr(E->getArg(1)); @@ -19231,7 +19219,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_fminf: case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19249,6 +19239,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl index 6b8a6d14575db..07e63a8711c7f 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl @@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) { } // CHECK-LABEL: test_flat_add_2f16 -// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_flat_add_2f16 // GFX12: flat_atomic_pk_add_f16 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) { @@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) { } // CHECK-LABEL: test_global_add_half2 -// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr addrspace(1) %{{.*}}, <2 x half> %{{.*}}) +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX12-LABEL: test_global_add_half2 // GFX12: global_atomic_pk_add_f16 v2, v[0:1], v2, off
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96872 >From f5747ae0c6eb1cb40d13cd99244734996777c65b Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Tue, 11 Jun 2024 10:58:44 +0200 Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} Need to emit syncscope and new metadata to get the native instruction, most of the time. --- clang/lib/CodeGen/CGBuiltin.cpp | 39 +-- .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl| 2 +- .../builtins-fp-atomics-gfx12.cl | 4 +- .../builtins-fp-atomics-gfx90a.cl | 4 +- .../builtins-fp-atomics-gfx940.cl | 4 +- 5 files changed, 34 insertions(+), 19 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index c199976956085..00f581dced900 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -58,6 +58,7 @@ #include "llvm/IR/MDBuilder.h" #include "llvm/IR/MatrixBuilder.h" #include "llvm/IR/MemoryModelRelaxationAnnotations.h" +#include "llvm/Support/AMDGPUAddrSpace.h" #include "llvm/Support/ConvertUTF.h" #include "llvm/Support/MathExtras.h" #include "llvm/Support/ScopedPrinter.h" @@ -18790,8 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: @@ -18803,18 +18802,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: - ArgTy = llvm::Type::getFloatTy(getLLVMContext()); - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: ArgTy = llvm::FixedVectorType::get( llvm::Type::getHalfTy(getLLVMContext()), 2); IID = Intrinsic::amdgcn_global_atomic_fadd; break; -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: - IID = Intrinsic::amdgcn_global_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: IID = Intrinsic::amdgcn_global_atomic_fmin; break; @@ -19237,7 +19229,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: case AMDGPU::BI__builtin_amdgcn_ds_faddf: case AMDGPU::BI__builtin_amdgcn_ds_fminf: - case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: { + case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19253,6 +19247,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: @@ -19287,8 +19283,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)), EmitScalarExpr(E->getArg(3)), AO, SSID); } else { - // The ds_atomic_fadd_* builtins do not have syncscope/order arguments. - SSID = llvm::SyncScope::System; + // Most of the builtins do not have syncscope/order arguments. For DS + // atomics the scope doesn't really matter, as they implicitly operate at + // workgroup scope. + // + // The global/flat cases need to use agent scope to consistently produce + // the native instruction instead of a cmpxchg expansion. + SSID = getLLVMContext().getOrInsertSyncScopeID("agent"); AO = AtomicOrdering::SequentiallyConsistent; // The v2bf16 builtin uses i16 instead of a natural bfloat type. @@ -19303,6 +19304,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID); if (Volatile) RMW->setVolatile(true); + +unsigned AddrSpace = Ptr.getType()->getAddressSpace(); +if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) { + // Most targets require "amdgpu.no.fine.grained.memory" to emit the nativ
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (PR #96874)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96874 >From 5944dea0c3f7207ce62a56c2b8806ecf5d53b527 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:15:26 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 17 ++--- .../CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl | 6 -- .../CodeGenOpenCL/builtins-fp-atomics-gfx940.cl | 3 ++- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index fec4fc4be562d..309c069d44738 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18793,10 +18793,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, } case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { Intrinsic::ID IID; llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); switch (BuiltinID) { @@ -18806,19 +18804,12 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: IID = Intrinsic::amdgcn_global_atomic_fmax; break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: - IID = Intrinsic::amdgcn_flat_atomic_fadd; - break; case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: IID = Intrinsic::amdgcn_flat_atomic_fmin; break; case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: IID = Intrinsic::amdgcn_flat_atomic_fmax; break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: - ArgTy = llvm::Type::getFloatTy(getLLVMContext()); - IID = Intrinsic::amdgcn_flat_atomic_fadd; - break; } llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); llvm::Value *Val = EmitScalarExpr(E->getArg(1)); @@ -19221,7 +19212,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19241,6 +19234,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl index cd10777dbe079..02e289427238f 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl @@ -45,7 +45,8 @@ void test_global_max_f64(__global double *addr, double x){ } // CHECK-LABEL: test_flat_add_local_f64 -// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p3.f64(ptr addrspace(3) %{{.*}}, double %{{.*}}) +// CHECK: = atomicrmw fadd ptr addrspace(3) %{{.+}}, double %{{.+}} syncscope("agent") seq_cst, align 8{{$}} + // GFX90A-LABEL: test_flat_add_local_f64$local // GFX90A: ds_add_rtn_f64 void test_flat_add_local_f64(__local double *addr, double x){ @@ -54,7 +55,8 @@ void test_flat_add_local_f64(__local double *addr, double x){ } // CHECK-LABEL: test_flat_global_add_f64 -// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p1.f64(ptr addrspace(1) %{{.*}}, double %{{.*}}) +// CHECK: = atomicrmw fadd ptr addrspace(1) {{.+}}, double %{{.+}} syncscope("agent") seq_cst, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX90A-LABEL: test_flat_global_add_f64$local // GFX90A: global_atomic_add_f64 void test_flat_global_add_f64(__global double *addr, double x){ diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl index 589dcd406630d..bd9b8c7268e06 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl @@ -10,7 +10,8 @@ typedef half _
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96875 >From a15cfba94245201cbb963ab76c15018c2bc42a61 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 19:34:43 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 26 ++- .../builtins-fp-atomics-gfx12.cl | 24 - .../builtins-fp-atomics-gfx90a.cl | 6 ++--- .../builtins-fp-atomics-gfx940.cl | 14 +++--- 4 files changed, 38 insertions(+), 32 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index 309c069d44738..d98fd0012e15a 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18817,22 +18817,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()}); return Builder.CreateCall(F, {Addr, Val}); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: { -Intrinsic::ID IID; -switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16: - IID = Intrinsic::amdgcn_global_atomic_fadd_v2bf16; - break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: - IID = Intrinsic::amdgcn_flat_atomic_fadd_v2bf16; - break; -} -llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); -llvm::Value *Val = EmitScalarExpr(E->getArg(1)); -llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()}); -return Builder.CreateCall(F, {Addr, Val}); - } case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32: case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32: case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16: @@ -19214,7 +19198,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19236,6 +19222,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: @@ -19280,7 +19268,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, AO = AtomicOrdering::Monotonic; // The v2bf16 builtin uses i16 instead of a natural bfloat type. - if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16) { + if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16 || + BuiltinID == AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16 || + BuiltinID == AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16) { llvm::Type *V2BF16Ty = FixedVectorType::get( llvm::Type::getBFloatTy(Builder.getContext()), 2); Val = Builder.CreateBitCast(Val, V2BF16Ty); diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl index 07e63a8711c7f..e8b6eb57c38d7 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl @@ -11,7 +11,7 @@ typedef short __attribute__((ext_vector_type(2))) short2; // CHECK-LABEL: test_local_add_2bf16 // CHECK: [[BC0:%.+]] = bitcast <2 x i16> {{.+}} to <2 x bfloat> -// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x bfloat> [[BC0]] syncscope("agent") monotonic, align 4 +// CHECK-NEXT: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x bfloat> [[BC0]] syncscope("agent") monotonic, align 4 // CHECK-NEXT: bitcast <2 x bfloat> [[RMW]] to <2 x i16> // GFX12-LABEL: test_local_add_2bf16 @@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) { } // CHECK-LABEL: test_flat_add_2f16 -// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} +// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} // GFX12-LABEL: test_flat_add_2f
[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (PR #96876)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/96876 >From 06fb3add7a2292f40b54849c768e20ac76fd1605 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 26 Jun 2024 23:18:32 +0200 Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins --- clang/lib/CodeGen/CGBuiltin.cpp | 36 +-- .../builtins-fp-atomics-gfx90a.cl | 18 ++ 2 files changed, 21 insertions(+), 33 deletions(-) diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp index d98fd0012e15a..675561bd14ad4 100644 --- a/clang/lib/CodeGen/CGBuiltin.cpp +++ b/clang/lib/CodeGen/CGBuiltin.cpp @@ -18791,32 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() }); return Builder.CreateCall(F, { Src0, Builder.getFalse() }); } - case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: - case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { -Intrinsic::ID IID; -llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext()); -switch (BuiltinID) { -case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: - IID = Intrinsic::amdgcn_global_atomic_fmin; - break; -case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: - IID = Intrinsic::amdgcn_global_atomic_fmax; - break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: - IID = Intrinsic::amdgcn_flat_atomic_fmin; - break; -case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: - IID = Intrinsic::amdgcn_flat_atomic_fmax; - break; -} -llvm::Value *Addr = EmitScalarExpr(E->getArg(0)); -llvm::Value *Val = EmitScalarExpr(E->getArg(1)); -llvm::Function *F = -CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()}); -return Builder.CreateCall(F, {Addr, Val}); - } case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32: case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32: case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16: @@ -19200,7 +19174,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16: - case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: { + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: + case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: + case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: { llvm::AtomicRMWInst::BinOp BinOp; switch (BuiltinID) { case AMDGPU::BI__builtin_amdgcn_atomic_inc32: @@ -19227,8 +19205,12 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID, BinOp = llvm::AtomicRMWInst::FAdd; break; case AMDGPU::BI__builtin_amdgcn_ds_fminf: +case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64: BinOp = llvm::AtomicRMWInst::FMin; break; +case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64: +case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: BinOp = llvm::AtomicRMWInst::FMax; break; diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl index 9381ce951df3e..556e553903d1a 100644 --- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl +++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl @@ -27,7 +27,8 @@ void test_global_add_half2(__global half2 *addr, half2 x) { } // CHECK-LABEL: test_global_global_min_f64 -// CHECK: call double @llvm.amdgcn.global.atomic.fmin.f64.p1.f64(ptr addrspace(1) %{{.*}}, double %{{.*}}) +// CHECK: = atomicrmw fmin ptr addrspace(1) {{.+}}, double %{{.+}} syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX90A-LABEL: test_global_global_min_f64$local // GFX90A: global_atomic_min_f64 void test_global_global_min_f64(__global double *addr, double x){ @@ -36,7 +37,8 @@ void test_global_global_min_f64(__global double *addr, double x){ } // CHECK-LABEL: test_global_max_f64 -// CHECK: call double @llvm.amdgcn.global.atomic.fmax.f64.p1.f64(ptr addrspace(1) %{{.*}}, double %{{.*}}) +// CHECK: = atomicrmw fmax ptr addrspace(1) {{.+}}, double %{{.+}} syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}} + // GFX90A-LABEL: test_global_max_f64$local // GFX90A: global_atomic_max_f64 void test_global_max_f64(__global double *addr, double x){ @@ -65,7 +67,8 @@ void test_flat_global_add_f64(__global double *addr, doub
[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/97050 >From fea266d72c82212f8b020614da367908640d3d34 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Thu, 27 Jun 2024 16:32:48 +0200 Subject: [PATCH] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics These are now fully covered by atomicrmw. --- llvm/include/llvm/IR/IntrinsicsAMDGPU.td | 4 - llvm/lib/IR/AutoUpgrade.cpp | 14 +- llvm/lib/Target/AMDGPU/AMDGPUInstructions.td | 2 - .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp | 2 - .../Target/AMDGPU/AMDGPUSearchableTables.td | 2 - llvm/lib/Target/AMDGPU/FLATInstructions.td| 2 - llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 6 +- llvm/test/Bitcode/amdgcn-atomic.ll| 22 ++ .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll| 106 - .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 218 -- llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll | 193 11 files changed, 33 insertions(+), 538 deletions(-) diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td index ab2620fdcf6b3..119281ca6103a 100644 --- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td +++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td @@ -2955,10 +2955,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic { def NAME#"_"#kind : AMDGPUMFp8SmfmacIntrinsic; } -// bf16 atomics use v2i16 argument since there is no bf16 data type in the llvm. -def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn; -def int_amdgcn_flat_atomic_fadd_v2bf16 : AMDGPUAtomicRtn; - defset list AMDGPUMFMAIntrinsics940 = { def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic; def int_amdgcn_mfma_i32_32x32x16_i8 : AMDGPUMfmaIntrinsic; diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp index 53de9eef516b3..f566a0e3c3043 100644 --- a/llvm/lib/IR/AutoUpgrade.cpp +++ b/llvm/lib/IR/AutoUpgrade.cpp @@ -1034,7 +1034,9 @@ static bool upgradeIntrinsicFunction1(Function *F, Function *&NewFn, } if (Name.starts_with("ds.fadd") || Name.starts_with("ds.fmin") || - Name.starts_with("ds.fmax")) { + Name.starts_with("ds.fmax") || + Name.starts_with("global.atomic.fadd.v2bf16") || + Name.starts_with("flat.atomic.fadd.v2bf16")) { // Replaced with atomicrmw fadd/fmin/fmax, so there's no new // declaration. NewFn = nullptr; @@ -4042,7 +4044,9 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, CallBase *CI, .StartsWith("ds.fmin", AtomicRMWInst::FMin) .StartsWith("ds.fmax", AtomicRMWInst::FMax) .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap) - .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap); + .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap) + .StartsWith("global.atomic.fadd", AtomicRMWInst::FAdd) + .StartsWith("flat.atomic.fadd", AtomicRMWInst::FAdd); unsigned NumOperands = CI->getNumOperands(); if (NumOperands < 3) // Malformed bitcode. @@ -4097,8 +4101,10 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, CallBase *CI, Builder.CreateAtomicRMW(RMWOp, Ptr, Val, std::nullopt, Order, SSID); if (PtrTy->getAddressSpace() != 3) { -RMW->setMetadata("amdgpu.no.fine.grained.memory", - MDNode::get(F->getContext(), {})); +MDNode *EmptyMD = MDNode::get(F->getContext(), {}); +RMW->setMetadata("amdgpu.no.fine.grained.memory", EmptyMD); +if (RMWOp == AtomicRMWInst::FAdd && RetTy->isFloatTy()) + RMW->setMetadata("amdgpu.ignore.denormal.mode", EmptyMD); } if (IsVolatile) diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td index c6dbc58395e48..db8b44149cf47 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td +++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td @@ -620,12 +620,10 @@ multiclass local_addr_space_atomic_op { defm int_amdgcn_flat_atomic_fadd : noret_op; defm int_amdgcn_flat_atomic_fadd : flat_addr_space_atomic_op; -defm int_amdgcn_flat_atomic_fadd_v2bf16 : noret_op; defm int_amdgcn_flat_atomic_fmin : noret_op; defm int_amdgcn_flat_atomic_fmax : noret_op; defm int_amdgcn_global_atomic_fadd : global_addr_space_atomic_op; defm int_amdgcn_flat_atomic_fadd : global_addr_space_atomic_op; -defm int_amdgcn_global_atomic_fadd_v2bf16 : noret_op; defm int_amdgcn_global_atomic_fmin : noret_op; defm int_amdgcn_global_atomic_fmax : noret_op; defm int_amdgcn_global_atomic_csub : noret_op; diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp index aa329a58547f3..546c0a238e430 100644 --- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp +++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp @@ -4898,8 +4898,6 @@ AMDGPURegisterBankInfo::getInstrMapping(const MachineInstr &MI) const { case Intrinsic::amdgcn_flat_atomic_fmax: case Intrinsic
[llvm-branch-commits] [libcxx] [libc++][doc] Update the release notes for LLVM 19. (PR #100167)
https://github.com/ldionne approved this pull request. https://github.com/llvm/llvm-project/pull/100167 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)
llvmbot wrote: @jhuber6 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100174 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100174 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100174 Backport e0649a5dfc6b859d652318f578bc3d49674787a4 Requested by: @jhuber6 >From 62f7338ac4509a71ce149ab879ed35cc13f5f00f Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 23 Jul 2024 12:54:00 -0500 Subject: [PATCH] [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) Summary: The NVPTX backend optimizes the ABI for functions that are internal, however, this is not legal for indirect call prototypes. Previously, we would modify the ABI on an aggregate byval type passed to an indirect call prototype, which would make PTXAS error. This patch just passes the function as a nullptr to force strict ABI compliance without modification in the helper function. Fixes https://github.com/llvm/llvm-project/issues/100055 (cherry picked from commit e0649a5dfc6b859d652318f578bc3d49674787a4) --- libc/config/gpu/entrypoints.txt | 15 +--- llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp | 5 +- llvm/test/CodeGen/NVPTX/indirect_byval.ll | 94 + 3 files changed, 101 insertions(+), 13 deletions(-) create mode 100644 llvm/test/CodeGen/NVPTX/indirect_byval.ll diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt index 42909cec55890..fa878d8999227 100644 --- a/libc/config/gpu/entrypoints.txt +++ b/libc/config/gpu/entrypoints.txt @@ -1,13 +1,3 @@ -if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU) - set(extra_entrypoints - # stdio.h entrypoints - libc.src.stdio.snprintf - libc.src.stdio.sprintf - libc.src.stdio.vsnprintf - libc.src.stdio.vsprintf - ) -endif() - set(TARGET_LIBC_ENTRYPOINTS # assert.h entrypoints libc.src.assert.__assert_fail @@ -186,13 +176,16 @@ set(TARGET_LIBC_ENTRYPOINTS libc.src.errno.errno # stdio.h entrypoints -${extra_entrypoints} libc.src.stdio.clearerr libc.src.stdio.fclose libc.src.stdio.printf libc.src.stdio.vprintf libc.src.stdio.fprintf libc.src.stdio.vfprintf +libc.src.stdio.snprintf +libc.src.stdio.sprintf +libc.src.stdio.vsnprintf +libc.src.stdio.vsprintf libc.src.stdio.feof libc.src.stdio.ferror libc.src.stdio.fflush diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 44c1a2e50486c..6975412ce5d35 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -1429,7 +1429,6 @@ std::string NVPTXTargetLowering::getPrototype( bool first = true; - const Function *F = CB.getFunction(); unsigned NumArgs = VAInfo ? VAInfo->first : Args.size(); for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) { Type *Ty = Args[i].Ty; @@ -1471,10 +1470,12 @@ std::string NVPTXTargetLowering::getPrototype( continue; } +// Indirect calls need strict ABI alignment so we disable optimizations by +// not providing a function to optimize. Type *ETy = Args[i].IndirectType; Align InitialAlign = Outs[OIdx].Flags.getNonZeroByValAlign(); Align ParamByValAlign = -getFunctionByValParamAlign(F, ETy, InitialAlign, DL); +getFunctionByValParamAlign(/*F=*/nullptr, ETy, InitialAlign, DL); O << ".param .align " << ParamByValAlign.value() << " .b8 "; O << "_"; diff --git a/llvm/test/CodeGen/NVPTX/indirect_byval.ll b/llvm/test/CodeGen/NVPTX/indirect_byval.ll new file mode 100644 index 0..ac6c4e262fd60 --- /dev/null +++ b/llvm/test/CodeGen/NVPTX/indirect_byval.ll @@ -0,0 +1,94 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | FileCheck %s +; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | %ptxas-verify %} + +target triple = "nvptx64-nvidia-cuda" + +%struct.S = type { i8 } +%struct.U = type { i64 } + +@ptr = external global ptr, align 8 + +define internal i32 @foo() { +; CHECK-LABEL: foo( +; CHECK: { +; CHECK-NEXT:.local .align 1 .b8 __local_depot0[2]; +; CHECK-NEXT:.reg .b64 %SP; +; CHECK-NEXT:.reg .b64 %SPL; +; CHECK-NEXT:.reg .b16 %rs<2>; +; CHECK-NEXT:.reg .b32 %r<3>; +; CHECK-NEXT:.reg .b64 %rd<3>; +; CHECK-EMPTY: +; CHECK-NEXT: // %bb.0: // %entry +; CHECK-NEXT:mov.u64 %SPL, __local_depot0; +; CHECK-NEXT:cvta.local.u64 %SP, %SPL; +; CHECK-NEXT:ld.global.u64 %rd1, [ptr]; +; CHECK-NEXT:ld.u8 %rs1, [%SP+1]; +; CHECK-NEXT:add.u64 %rd2, %SP, 0; +; CHECK-NEXT:{ // callseq 0, 0 +; CHECK-NEXT:.param .align 1 .b8 param0[1]; +; CHECK-NEXT:st.param.b8 [param0+0], %rs1; +; CHECK-NEXT:.param .b64 param1; +; CHECK-NEXT:st.param.b64 [param1+0], %rd2; +; CHECK-NEXT:.param .b32 retval0; +; CHECK-NEXT:prototype_0 : .callprototype (.param .b32 _) _ (.param .align 1 .b8 _[1], .param .b64 _); +; CHECK-NEXT:call (retval0), +; CHECK-NEXT:%rd1, +; CHECK-NEXT:
[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)
llvmbot wrote: @llvm/pr-subscribers-libc Author: None (llvmbot) Changes Backport e0649a5dfc6b859d652318f578bc3d49674787a4 Requested by: @jhuber6 --- Full diff: https://github.com/llvm/llvm-project/pull/100174.diff 3 Files Affected: - (modified) libc/config/gpu/entrypoints.txt (+4-11) - (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+3-2) - (added) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+94) ``diff diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt index 42909cec55890..fa878d8999227 100644 --- a/libc/config/gpu/entrypoints.txt +++ b/libc/config/gpu/entrypoints.txt @@ -1,13 +1,3 @@ -if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU) - set(extra_entrypoints - # stdio.h entrypoints - libc.src.stdio.snprintf - libc.src.stdio.sprintf - libc.src.stdio.vsnprintf - libc.src.stdio.vsprintf - ) -endif() - set(TARGET_LIBC_ENTRYPOINTS # assert.h entrypoints libc.src.assert.__assert_fail @@ -186,13 +176,16 @@ set(TARGET_LIBC_ENTRYPOINTS libc.src.errno.errno # stdio.h entrypoints -${extra_entrypoints} libc.src.stdio.clearerr libc.src.stdio.fclose libc.src.stdio.printf libc.src.stdio.vprintf libc.src.stdio.fprintf libc.src.stdio.vfprintf +libc.src.stdio.snprintf +libc.src.stdio.sprintf +libc.src.stdio.vsnprintf +libc.src.stdio.vsprintf libc.src.stdio.feof libc.src.stdio.ferror libc.src.stdio.fflush diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp index 44c1a2e50486c..6975412ce5d35 100644 --- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp +++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp @@ -1429,7 +1429,6 @@ std::string NVPTXTargetLowering::getPrototype( bool first = true; - const Function *F = CB.getFunction(); unsigned NumArgs = VAInfo ? VAInfo->first : Args.size(); for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) { Type *Ty = Args[i].Ty; @@ -1471,10 +1470,12 @@ std::string NVPTXTargetLowering::getPrototype( continue; } +// Indirect calls need strict ABI alignment so we disable optimizations by +// not providing a function to optimize. Type *ETy = Args[i].IndirectType; Align InitialAlign = Outs[OIdx].Flags.getNonZeroByValAlign(); Align ParamByValAlign = -getFunctionByValParamAlign(F, ETy, InitialAlign, DL); +getFunctionByValParamAlign(/*F=*/nullptr, ETy, InitialAlign, DL); O << ".param .align " << ParamByValAlign.value() << " .b8 "; O << "_"; diff --git a/llvm/test/CodeGen/NVPTX/indirect_byval.ll b/llvm/test/CodeGen/NVPTX/indirect_byval.ll new file mode 100644 index 0..ac6c4e262fd60 --- /dev/null +++ b/llvm/test/CodeGen/NVPTX/indirect_byval.ll @@ -0,0 +1,94 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | FileCheck %s +; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | %ptxas-verify %} + +target triple = "nvptx64-nvidia-cuda" + +%struct.S = type { i8 } +%struct.U = type { i64 } + +@ptr = external global ptr, align 8 + +define internal i32 @foo() { +; CHECK-LABEL: foo( +; CHECK: { +; CHECK-NEXT:.local .align 1 .b8 __local_depot0[2]; +; CHECK-NEXT:.reg .b64 %SP; +; CHECK-NEXT:.reg .b64 %SPL; +; CHECK-NEXT:.reg .b16 %rs<2>; +; CHECK-NEXT:.reg .b32 %r<3>; +; CHECK-NEXT:.reg .b64 %rd<3>; +; CHECK-EMPTY: +; CHECK-NEXT: // %bb.0: // %entry +; CHECK-NEXT:mov.u64 %SPL, __local_depot0; +; CHECK-NEXT:cvta.local.u64 %SP, %SPL; +; CHECK-NEXT:ld.global.u64 %rd1, [ptr]; +; CHECK-NEXT:ld.u8 %rs1, [%SP+1]; +; CHECK-NEXT:add.u64 %rd2, %SP, 0; +; CHECK-NEXT:{ // callseq 0, 0 +; CHECK-NEXT:.param .align 1 .b8 param0[1]; +; CHECK-NEXT:st.param.b8 [param0+0], %rs1; +; CHECK-NEXT:.param .b64 param1; +; CHECK-NEXT:st.param.b64 [param1+0], %rd2; +; CHECK-NEXT:.param .b32 retval0; +; CHECK-NEXT:prototype_0 : .callprototype (.param .b32 _) _ (.param .align 1 .b8 _[1], .param .b64 _); +; CHECK-NEXT:call (retval0), +; CHECK-NEXT:%rd1, +; CHECK-NEXT:( +; CHECK-NEXT:param0, +; CHECK-NEXT:param1 +; CHECK-NEXT:) +; CHECK-NEXT:, prototype_0; +; CHECK-NEXT:ld.param.b32 %r1, [retval0+0]; +; CHECK-NEXT:} // callseq 0 +; CHECK-NEXT:st.param.b32 [func_retval0+0], %r1; +; CHECK-NEXT:ret; +entry: + %s = alloca %struct.S, align 1 + %agg.tmp = alloca %struct.S, align 1 + %0 = load ptr, ptr @ptr, align 8 + %call = call i32 %0(ptr byval(%struct.S) align 1 %agg.tmp, ptr noundef %s) + ret i32 %call +} + +define internal i32 @bar() { +; CHECK-LABEL: bar( +; CHECK: // @bar +; CHECK-NEXT: { +; CHECK-NEXT:.local .align 8 .b8 __local_depot1[16]; +; CHECK-NEXT:.reg .b64 %SP; +; CHECK-NEXT:.reg .b64 %SPL; +; CHECK-NEXT:.reg .b32 %r<3>; +; CHECK-NEXT:.reg .b6
[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)
jhuber6 wrote: This should be merged https://github.com/llvm/llvm-project/pull/100174 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/99891 >From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 22 Jul 2024 11:56:23 -0700 Subject: [PATCH 1/3] Changed assignment of profiles with pseudo probe index Created using spr 1.3.4 --- bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++ .../X86/match-blocks-with-pseudo-probes.test | 25 ++ 2 files changed, 78 insertions(+), 32 deletions(-) diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp index 4105f626fb5b6..c135ee5ff4837 100644 --- a/bolt/lib/Profile/StaleProfileMatching.cpp +++ b/bolt/lib/Profile/StaleProfileMatching.cpp @@ -195,11 +195,15 @@ class StaleMatcher { void init(const std::vector &Blocks, const std::vector &Hashes, const std::vector &CallHashes, -std::optional YamlBFGUID) { +const std::unordered_map> +IndexToBinaryPseudoProbes, +const std::unordered_map +BinaryPseudoProbeToBlock, +const uint64_t YamlBFGUID) { assert(Blocks.size() == Hashes.size() && Hashes.size() == CallHashes.size() && "incorrect matcher initialization"); - for (size_t I = 0; I < Blocks.size(); I++) { FlowBlock *Block = Blocks[I]; uint16_t OpHash = Hashes[I].OpcodeHash; @@ -209,6 +213,8 @@ class StaleMatcher { std::make_pair(Hashes[I], Block)); this->Blocks.push_back(Block); } +this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes; +this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock; this->YamlBFGUID = YamlBFGUID; } @@ -234,10 +240,14 @@ class StaleMatcher { using HashBlockPairType = std::pair; std::unordered_map> OpHashToBlocks; std::unordered_map> CallHashToBlocks; - std::vector Blocks; + std::unordered_map> + IndexToBinaryPseudoProbes; + std::unordered_map + BinaryPseudoProbeToBlock; + std::vector Blocks; // If the pseudo probe checksums of the profiled and binary functions are // equal, then the YamlBF's GUID is defined and used to match blocks. - std::optional YamlBFGUID; + uint64_t YamlBFGUID; // Uses OpcodeHash to find the most similar block for a given hash. const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const { @@ -284,7 +294,7 @@ class StaleMatcher { // Searches for the pseudo probe attached to the matched function's block, // ignoring pseudo probes attached to function calls and inlined functions' // blocks. -outs() << "match with pseudo probes\n"; +std::vector BlockPseudoProbes; for (const auto &PseudoProbe : PseudoProbes) { // Ensures that pseudo probe information belongs to the appropriate // function and not an inlined function. @@ -293,11 +303,30 @@ class StaleMatcher { // Skips pseudo probes attached to function calls. if (PseudoProbe.Type != static_cast(PseudoProbeType::Block)) continue; - assert(PseudoProbe.Index < Blocks.size() && - "pseudo probe index out of range"); - return Blocks[PseudoProbe.Index]; + + BlockPseudoProbes.push_back(&PseudoProbe); } -return nullptr; + +// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo +// probe and binary pseudo probe. +if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1) + return nullptr; + +uint64_t Index = BlockPseudoProbes[0]->Index; +assert(Index < Blocks.size() && "Invalid pseudo probe index"); + +auto It = IndexToBinaryPseudoProbes.find(Index); +assert(It != IndexToBinaryPseudoProbes.end() && + "All blocks should have a pseudo probe"); +if (It->second.size() > 1) + return nullptr; + +const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0]; +auto BinaryPseudoProbeIt = BinaryPseudoProbeToBlock.find(BinaryPseudoProbe); +assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() && + "All binary pseudo probes should belong a binary basic block"); + +return BinaryPseudoProbeIt->second; } }; @@ -491,6 +520,11 @@ size_t matchWeightsByHashes( std::vector CallHashes; std::vector Blocks; std::vector BlendedHashes; + std::unordered_map> + IndexToBinaryPseudoProbes; + std::unordered_map + BinaryPseudoProbeToBlock; + const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder(); for (uint64_t I = 0; I < BlockOrder.size(); I++) { const BinaryBasicBlock *BB = BlockOrder[I]; assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock"); @@ -510,9 +544,27 @@ size_t matchWeightsByHashes( Blocks.push_back(&Func.Blocks[I + 1]); BlendedBlockHash BlendedHash(BB->getHash()); BlendedHashes.push_back(BlendedHash); +if (PseudoProbeDecoder) { + const AddressProbesMap &ProbeMap = + PseudoProbeDecoder->getAd
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
@@ -306,26 +310,41 @@ class StaleMatcher { BlockPseudoProbes.push_back(&PseudoProbe); } - // Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo // probe and binary pseudo probe. -if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1) +if (BlockPseudoProbes.size() == 0) { + if (opts::Verbosity >= 2) +errs() << "BOLT-WARNING: no pseudo probes in profile block\n"; aaupov wrote: Bump verbosity for this logging to >=3. Add aggregated counters – those could be printed for BF at v>=2. BC-level aggregated counters can be printed at v>=1. https://github.com/llvm/llvm-project/pull/99891 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
@@ -555,6 +574,10 @@ size_t matchWeightsByHashes( ProbeMap.lower_bound(FuncAddr + BlockRange.second)); for (const auto &[_, Probes] : BlockProbes) { for (const MCDecodedPseudoProbe &Probe : Probes) { + if (Probe.getInlineTreeNode()->hasInlineSite()) aaupov wrote: What do we prune with this check? Don't we discard valid probes belonging to the current function? https://github.com/llvm/llvm-project/pull/99891 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100183 Backport e64e745e8fb8 Requested by: @ldionne >From 0a44617ee7a29a5a7758285c5c367b66d68051a3 Mon Sep 17 00:00:00 2001 From: Louis Dionne Date: Tue, 23 Jul 2024 13:04:54 -0500 Subject: [PATCH] [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) This patch applies the comments provided on #84573. This is done as a separate PR to avoid merge conflicts with downstreams that already had ptrauth support. (cherry picked from commit e64e745e8fb802ffb06259b1a5ba3db713a17087) --- libcxx/include/typeinfo | 9 --- libcxx/src/include/overridable_function.h | 6 ++--- libcxxabi/src/private_typeinfo.cpp| 33 +-- 3 files changed, 21 insertions(+), 27 deletions(-) diff --git a/libcxx/include/typeinfo b/libcxx/include/typeinfo index d1c0de3c1bfdd..2727cad02fa99 100644 --- a/libcxx/include/typeinfo +++ b/libcxx/include/typeinfo @@ -275,13 +275,14 @@ struct __type_info_implementations { __impl; }; -#if defined(__arm64__) && __has_cpp_attribute(clang::ptrauth_vtable_pointer) -# if __has_feature(ptrauth_type_info_discriminated_vtable_pointer) +#if __has_cpp_attribute(_Clang::__ptrauth_vtable_pointer__) +# if __has_feature(ptrauth_type_info_vtable_pointer_discrimination) #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH \ - [[clang::ptrauth_vtable_pointer(process_independent, address_discrimination, type_discrimination)]] + [[_Clang::__ptrauth_vtable_pointer__(process_independent, address_discrimination, type_discrimination)]] # else #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH \ - [[clang::ptrauth_vtable_pointer(process_independent, no_address_discrimination, no_extra_discrimination)]] + [[_Clang::__ptrauth_vtable_pointer__( \ + process_independent, no_address_discrimination, no_extra_discrimination)]] # endif #else # define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH diff --git a/libcxx/src/include/overridable_function.h b/libcxx/src/include/overridable_function.h index e71e4f104b290..c7639f56eee26 100644 --- a/libcxx/src/include/overridable_function.h +++ b/libcxx/src/include/overridable_function.h @@ -13,7 +13,7 @@ #include <__config> #include -#if defined(__arm64e__) && __has_feature(ptrauth_calls) +#if __has_feature(ptrauth_calls) # include #endif @@ -83,13 +83,13 @@ _LIBCPP_HIDE_FROM_ABI bool __is_function_overridden(_Ret (*__fptr)(_Args...)) no uintptr_t __end = reinterpret_cast(&__lcxx_override_end); uintptr_t __ptr = reinterpret_cast(__fptr); -#if defined(__arm64e__) && __has_feature(ptrauth_calls) +# if __has_feature(ptrauth_calls) // We must pass a void* to ptrauth_strip since it only accepts a pointer type. Also, in particular, // we must NOT pass a function pointer, otherwise we will strip the function pointer, and then attempt // to authenticate and re-sign it when casting it to a uintptr_t again, which will fail because we just // stripped the function pointer. See rdar://122927845. __ptr = reinterpret_cast(ptrauth_strip(reinterpret_cast(__ptr), ptrauth_key_function_pointer)); -#endif +# endif // Finally, the function was overridden if it falls outside of the section's bounds. return __ptr < __start || __ptr > __end; diff --git a/libcxxabi/src/private_typeinfo.cpp b/libcxxabi/src/private_typeinfo.cpp index 9e58501a55934..9dba91e1985e3 100644 --- a/libcxxabi/src/private_typeinfo.cpp +++ b/libcxxabi/src/private_typeinfo.cpp @@ -55,15 +55,12 @@ #include #endif - -template -static inline -T * -get_vtable(T *vtable) { +template +static inline T* strip_vtable(T* vtable) { #if __has_feature(ptrauth_calls) -vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer); + vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer); #endif -return vtable; + return vtable; } static inline @@ -117,11 +114,10 @@ void dyn_cast_get_derived_info(derived_object_info* info, const void* static_ptr reinterpret_cast(vtable) + offset_to_ti_proxy; info->dynamic_type = *(reinterpret_cast(ptr_to_ti_proxy)); #else -void **vtable = *static_cast(static_ptr); -vtable = get_vtable(vtable); -info->offset_to_derived = reinterpret_cast(vtable[-2]); -info->dynamic_ptr = static_cast(static_ptr) + info->offset_to_derived; -info->dynamic_type = static_cast(vtable[-1]); + void** vtable = strip_vtable(*static_cast(static_ptr)); + info->offset_to_derived = reinterpret_cast(vtable[-2]); + info->dynamic_ptr = static_cast(static_ptr) + info->offset_to_derived; + info->dynamic_type = static_cast(vtable[-1]); #endif } @@ -576,8 +572,7 @@ __base
[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)
llvmbot wrote: @ahmedbougacha What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100183 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)
llvmbot wrote: @llvm/pr-subscribers-libcxxabi Author: None (llvmbot) Changes Backport e64e745e8fb8 Requested by: @ldionne --- Full diff: https://github.com/llvm/llvm-project/pull/100183.diff 3 Files Affected: - (modified) libcxx/include/typeinfo (+5-4) - (modified) libcxx/src/include/overridable_function.h (+3-3) - (modified) libcxxabi/src/private_typeinfo.cpp (+13-20) ``diff diff --git a/libcxx/include/typeinfo b/libcxx/include/typeinfo index d1c0de3c1bfdd..2727cad02fa99 100644 --- a/libcxx/include/typeinfo +++ b/libcxx/include/typeinfo @@ -275,13 +275,14 @@ struct __type_info_implementations { __impl; }; -#if defined(__arm64__) && __has_cpp_attribute(clang::ptrauth_vtable_pointer) -# if __has_feature(ptrauth_type_info_discriminated_vtable_pointer) +#if __has_cpp_attribute(_Clang::__ptrauth_vtable_pointer__) +# if __has_feature(ptrauth_type_info_vtable_pointer_discrimination) #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH \ - [[clang::ptrauth_vtable_pointer(process_independent, address_discrimination, type_discrimination)]] + [[_Clang::__ptrauth_vtable_pointer__(process_independent, address_discrimination, type_discrimination)]] # else #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH \ - [[clang::ptrauth_vtable_pointer(process_independent, no_address_discrimination, no_extra_discrimination)]] + [[_Clang::__ptrauth_vtable_pointer__( \ + process_independent, no_address_discrimination, no_extra_discrimination)]] # endif #else # define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH diff --git a/libcxx/src/include/overridable_function.h b/libcxx/src/include/overridable_function.h index e71e4f104b290..c7639f56eee26 100644 --- a/libcxx/src/include/overridable_function.h +++ b/libcxx/src/include/overridable_function.h @@ -13,7 +13,7 @@ #include <__config> #include -#if defined(__arm64e__) && __has_feature(ptrauth_calls) +#if __has_feature(ptrauth_calls) # include #endif @@ -83,13 +83,13 @@ _LIBCPP_HIDE_FROM_ABI bool __is_function_overridden(_Ret (*__fptr)(_Args...)) no uintptr_t __end = reinterpret_cast(&__lcxx_override_end); uintptr_t __ptr = reinterpret_cast(__fptr); -#if defined(__arm64e__) && __has_feature(ptrauth_calls) +# if __has_feature(ptrauth_calls) // We must pass a void* to ptrauth_strip since it only accepts a pointer type. Also, in particular, // we must NOT pass a function pointer, otherwise we will strip the function pointer, and then attempt // to authenticate and re-sign it when casting it to a uintptr_t again, which will fail because we just // stripped the function pointer. See rdar://122927845. __ptr = reinterpret_cast(ptrauth_strip(reinterpret_cast(__ptr), ptrauth_key_function_pointer)); -#endif +# endif // Finally, the function was overridden if it falls outside of the section's bounds. return __ptr < __start || __ptr > __end; diff --git a/libcxxabi/src/private_typeinfo.cpp b/libcxxabi/src/private_typeinfo.cpp index 9e58501a55934..9dba91e1985e3 100644 --- a/libcxxabi/src/private_typeinfo.cpp +++ b/libcxxabi/src/private_typeinfo.cpp @@ -55,15 +55,12 @@ #include #endif - -template -static inline -T * -get_vtable(T *vtable) { +template +static inline T* strip_vtable(T* vtable) { #if __has_feature(ptrauth_calls) -vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer); + vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer); #endif -return vtable; + return vtable; } static inline @@ -117,11 +114,10 @@ void dyn_cast_get_derived_info(derived_object_info* info, const void* static_ptr reinterpret_cast(vtable) + offset_to_ti_proxy; info->dynamic_type = *(reinterpret_cast(ptr_to_ti_proxy)); #else -void **vtable = *static_cast(static_ptr); -vtable = get_vtable(vtable); -info->offset_to_derived = reinterpret_cast(vtable[-2]); -info->dynamic_ptr = static_cast(static_ptr) + info->offset_to_derived; -info->dynamic_type = static_cast(vtable[-1]); + void** vtable = strip_vtable(*static_cast(static_ptr)); + info->offset_to_derived = reinterpret_cast(vtable[-2]); + info->dynamic_ptr = static_cast(static_ptr) + info->offset_to_derived; + info->dynamic_type = static_cast(vtable[-1]); #endif } @@ -576,8 +572,7 @@ __base_class_type_info::has_unambiguous_public_base(__dynamic_cast_info* info, find the layout. */ offset_to_base = __offset_flags >> __offset_shift; if (is_virtual) { - const char* vtable = *static_cast(adjustedPtr); - vtable = get_vtable(vtable); + const char* vtable = strip_vtable(*static_cast(adjustedPtr)); offset_to_base = update_offset_to_base(vtable, offset_to_base); } }
[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add omp.target_triples attribute to the OffloadModuleInterface (PR #100154)
https://github.com/bhandarkar-pranav approved this pull request. https://github.com/llvm/llvm-project/pull/100154 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [Flang][OpenMP] Add frontend support for -fopenmp-targets (PR #100155)
https://github.com/bhandarkar-pranav approved this pull request. https://github.com/llvm/llvm-project/pull/100155 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
@@ -555,6 +574,10 @@ size_t matchWeightsByHashes( ProbeMap.lower_bound(FuncAddr + BlockRange.second)); for (const auto &[_, Probes] : BlockProbes) { for (const MCDecodedPseudoProbe &Probe : Probes) { + if (Probe.getInlineTreeNode()->hasInlineSite()) shawbyoung wrote: This pruning resulted in a tangible increase in 1:1 mappings btw profile and binary pseudo probes - the PseudoProbeDecoder::ProbeMap interface not only contains the probe attached to some block b, but contains the block probes attached to block b inlined in other functions. https://github.com/llvm/llvm-project/pull/99891 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)
https://github.com/azhan92 approved this pull request. lgtm https://github.com/llvm/llvm-project/pull/100151 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100191 Backport 92a9d4831d5e40c286247c30fcd794563adbef6e Requested by: @ian-twilightcoder >From e3ec8d577ee97f496f7a27fc6099ca5ded220d3b Mon Sep 17 00:00:00 2001 From: Ian Anderson Date: Tue, 23 Jul 2024 13:02:59 -0700 Subject: [PATCH] [clang][headers] Including stddef.h always redefines NULL (#99727) stddef.h always includes __stddef_null.h. This is fine in modules because it's not possible to re-include the pcm, and it's necessary to export the _Builtin_stddef.null submodule. However, without modules it causes NULL to always get redefined which disrupts some C++ code. Rework the inclusion of __stddef_null.h so that with not building with modules it's only included if __need_NULL is set by the includer, or it's the first time stddef.h is being included. (cherry picked from commit 92a9d4831d5e40c286247c30fcd794563adbef6e) --- clang/lib/Headers/stdarg.h | 4 +- clang/lib/Headers/stddef.h | 21 - clang/test/Headers/stddefneeds.cpp | 15 -- clang/test/Modules/stddef.cpp | 73 ++ 4 files changed, 105 insertions(+), 8 deletions(-) create mode 100644 clang/test/Modules/stddef.cpp diff --git a/clang/lib/Headers/stdarg.h b/clang/lib/Headers/stdarg.h index 8292ab907becf..6203d7a600a23 100644 --- a/clang/lib/Headers/stdarg.h +++ b/clang/lib/Headers/stdarg.h @@ -20,19 +20,18 @@ * modules. */ #if defined(__MVS__) && __has_include_next() -#include <__stdarg_header_macro.h> #undef __need___va_list #undef __need_va_list #undef __need_va_arg #undef __need___va_copy #undef __need_va_copy +#include <__stdarg_header_macro.h> #include_next #else #if !defined(__need___va_list) && !defined(__need_va_list) && \ !defined(__need_va_arg) && !defined(__need___va_copy) && \ !defined(__need_va_copy) -#include <__stdarg_header_macro.h> #define __need___va_list #define __need_va_list #define __need_va_arg @@ -45,6 +44,7 @@ !defined(__STRICT_ANSI__) #define __need_va_copy #endif +#include <__stdarg_header_macro.h> #endif #ifdef __need___va_list diff --git a/clang/lib/Headers/stddef.h b/clang/lib/Headers/stddef.h index 8985c526e8fc5..99b275aebf5aa 100644 --- a/clang/lib/Headers/stddef.h +++ b/clang/lib/Headers/stddef.h @@ -20,7 +20,6 @@ * modules. */ #if defined(__MVS__) && __has_include_next() -#include <__stddef_header_macro.h> #undef __need_ptrdiff_t #undef __need_size_t #undef __need_rsize_t @@ -31,6 +30,7 @@ #undef __need_max_align_t #undef __need_offsetof #undef __need_wint_t +#include <__stddef_header_macro.h> #include_next #else @@ -40,7 +40,6 @@ !defined(__need_NULL) && !defined(__need_nullptr_t) && \ !defined(__need_unreachable) && !defined(__need_max_align_t) && \ !defined(__need_offsetof) && !defined(__need_wint_t) -#include <__stddef_header_macro.h> #define __need_ptrdiff_t #define __need_size_t /* ISO9899:2011 7.20 (C11 Annex K): Define rsize_t if __STDC_WANT_LIB_EXT1__ is @@ -49,7 +48,24 @@ #define __need_rsize_t #endif #define __need_wchar_t +#if !defined(__STDDEF_H) || __has_feature(modules) +/* + * __stddef_null.h is special when building without modules: if __need_NULL is + * set, then it will unconditionally redefine NULL. To avoid stepping on client + * definitions of NULL, __need_NULL should only be set the first time this + * header is included, that is when __STDDEF_H is not defined. However, when + * building with modules, this header is a textual header and needs to + * unconditionally include __stdef_null.h to support multiple submodules + * exporting _Builtin_stddef.null. Take module SM with submodules A and B, whose + * headers both include stddef.h When SM.A builds, __STDDEF_H will be defined. + * When SM.B builds, the definition from SM.A will leak when building without + * local submodule visibility. stddef.h wouldn't include __stddef_null.h, and + * SM.B wouldn't import _Builtin_stddef.null, and SM.B's `export *` wouldn't + * export NULL as expected. When building with modules, always include + * __stddef_null.h so that everything works as expected. + */ #define __need_NULL +#endif #if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) || \ defined(__cplusplus) #define __need_nullptr_t @@ -65,6 +81,7 @@ /* wint_t is provided by and not . It's here * for compatibility, but must be explicitly requested. Therefore * __need_wint_t is intentionally not defined here. */ +#include <__stddef_header_macro.h> #endif #if defined(__need_ptrdiff_t) diff --git a/clang/test/Headers/stddefneeds.cpp b/clang/test/Headers/stddefneeds.cpp index 0763bbdee13ae..0282e8afa600d 100644 --- a/clang/test/Headers/stddefneeds.cpp +++ b/clang/test/Headers/stddefneeds.cpp @@ -56,14 +56,21 @@ max_align_t m5; #undef NULL #define NULL 0 -// glibc (and other) headers then define __need_NULL and rely on s
[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100191 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)
llvmbot wrote: @AaronBallman What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100191 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)
llvmbot wrote: @llvm/pr-subscribers-backend-x86 @llvm/pr-subscribers-clang-modules Author: None (llvmbot) Changes Backport 92a9d4831d5e40c286247c30fcd794563adbef6e Requested by: @ian-twilightcoder --- Full diff: https://github.com/llvm/llvm-project/pull/100191.diff 4 Files Affected: - (modified) clang/lib/Headers/stdarg.h (+2-2) - (modified) clang/lib/Headers/stddef.h (+19-2) - (modified) clang/test/Headers/stddefneeds.cpp (+11-4) - (added) clang/test/Modules/stddef.cpp (+73) ``diff diff --git a/clang/lib/Headers/stdarg.h b/clang/lib/Headers/stdarg.h index 8292ab907becf..6203d7a600a23 100644 --- a/clang/lib/Headers/stdarg.h +++ b/clang/lib/Headers/stdarg.h @@ -20,19 +20,18 @@ * modules. */ #if defined(__MVS__) && __has_include_next() -#include <__stdarg_header_macro.h> #undef __need___va_list #undef __need_va_list #undef __need_va_arg #undef __need___va_copy #undef __need_va_copy +#include <__stdarg_header_macro.h> #include_next #else #if !defined(__need___va_list) && !defined(__need_va_list) && \ !defined(__need_va_arg) && !defined(__need___va_copy) && \ !defined(__need_va_copy) -#include <__stdarg_header_macro.h> #define __need___va_list #define __need_va_list #define __need_va_arg @@ -45,6 +44,7 @@ !defined(__STRICT_ANSI__) #define __need_va_copy #endif +#include <__stdarg_header_macro.h> #endif #ifdef __need___va_list diff --git a/clang/lib/Headers/stddef.h b/clang/lib/Headers/stddef.h index 8985c526e8fc5..99b275aebf5aa 100644 --- a/clang/lib/Headers/stddef.h +++ b/clang/lib/Headers/stddef.h @@ -20,7 +20,6 @@ * modules. */ #if defined(__MVS__) && __has_include_next() -#include <__stddef_header_macro.h> #undef __need_ptrdiff_t #undef __need_size_t #undef __need_rsize_t @@ -31,6 +30,7 @@ #undef __need_max_align_t #undef __need_offsetof #undef __need_wint_t +#include <__stddef_header_macro.h> #include_next #else @@ -40,7 +40,6 @@ !defined(__need_NULL) && !defined(__need_nullptr_t) && \ !defined(__need_unreachable) && !defined(__need_max_align_t) && \ !defined(__need_offsetof) && !defined(__need_wint_t) -#include <__stddef_header_macro.h> #define __need_ptrdiff_t #define __need_size_t /* ISO9899:2011 7.20 (C11 Annex K): Define rsize_t if __STDC_WANT_LIB_EXT1__ is @@ -49,7 +48,24 @@ #define __need_rsize_t #endif #define __need_wchar_t +#if !defined(__STDDEF_H) || __has_feature(modules) +/* + * __stddef_null.h is special when building without modules: if __need_NULL is + * set, then it will unconditionally redefine NULL. To avoid stepping on client + * definitions of NULL, __need_NULL should only be set the first time this + * header is included, that is when __STDDEF_H is not defined. However, when + * building with modules, this header is a textual header and needs to + * unconditionally include __stdef_null.h to support multiple submodules + * exporting _Builtin_stddef.null. Take module SM with submodules A and B, whose + * headers both include stddef.h When SM.A builds, __STDDEF_H will be defined. + * When SM.B builds, the definition from SM.A will leak when building without + * local submodule visibility. stddef.h wouldn't include __stddef_null.h, and + * SM.B wouldn't import _Builtin_stddef.null, and SM.B's `export *` wouldn't + * export NULL as expected. When building with modules, always include + * __stddef_null.h so that everything works as expected. + */ #define __need_NULL +#endif #if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) || \ defined(__cplusplus) #define __need_nullptr_t @@ -65,6 +81,7 @@ /* wint_t is provided by and not . It's here * for compatibility, but must be explicitly requested. Therefore * __need_wint_t is intentionally not defined here. */ +#include <__stddef_header_macro.h> #endif #if defined(__need_ptrdiff_t) diff --git a/clang/test/Headers/stddefneeds.cpp b/clang/test/Headers/stddefneeds.cpp index 0763bbdee13ae..0282e8afa600d 100644 --- a/clang/test/Headers/stddefneeds.cpp +++ b/clang/test/Headers/stddefneeds.cpp @@ -56,14 +56,21 @@ max_align_t m5; #undef NULL #define NULL 0 -// glibc (and other) headers then define __need_NULL and rely on stddef.h -// to redefine NULL to the correct value again. -#define __need_NULL +// Including stddef.h again shouldn't redefine NULL #include // gtk headers then use __attribute__((sentinel)), which doesn't work if NULL // is 0. -void f(const char* c, ...) __attribute__((sentinel)); +void f(const char* c, ...) __attribute__((sentinel)); // expected-note{{function has been explicitly marked sentinel here}} void g() { + f("", NULL); // expected-warning{{missing sentinel in function call}} +} + +// glibc (and other) headers then define __need_NULL and rely on stddef.h +// to redefine NULL to the correct value again. +#define __need_NULL +#include + +void h() { f("", NULL); // Shouldn't warn. } diff --git a/
[llvm-branch-commits] [libc] 4c07e7f - Revert "[libc][RISCV] Add naked attribute to setjmp/longjmp (#100036)"
Author: Paul Kirth Date: 2024-07-23T13:15:47-07:00 New Revision: 4c07e7f659ab91c22c1b0440080902d0b931195d URL: https://github.com/llvm/llvm-project/commit/4c07e7f659ab91c22c1b0440080902d0b931195d DIFF: https://github.com/llvm/llvm-project/commit/4c07e7f659ab91c22c1b0440080902d0b931195d.diff LOG: Revert "[libc][RISCV] Add naked attribute to setjmp/longjmp (#100036)" This reverts commit 05b586be3d70cd51c809c52a67d36517fb4b8f6f. Added: Modified: libc/src/setjmp/riscv/longjmp.cpp libc/src/setjmp/riscv/setjmp.cpp Removed: diff --git a/libc/src/setjmp/riscv/longjmp.cpp b/libc/src/setjmp/riscv/longjmp.cpp index b14f636659ac3..0f9537ccc4151 100644 --- a/libc/src/setjmp/riscv/longjmp.cpp +++ b/libc/src/setjmp/riscv/longjmp.cpp @@ -30,7 +30,6 @@ namespace LIBC_NAMESPACE_DECL { -[[gnu::naked]] LLVM_LIBC_FUNCTION(void, longjmp, (__jmp_buf * buf, int val)) { LOAD(ra, buf->__pc); LOAD(s0, buf->__regs[0]); diff --git a/libc/src/setjmp/riscv/setjmp.cpp b/libc/src/setjmp/riscv/setjmp.cpp index 92982cc9d74d4..12def578b56f3 100644 --- a/libc/src/setjmp/riscv/setjmp.cpp +++ b/libc/src/setjmp/riscv/setjmp.cpp @@ -29,7 +29,6 @@ namespace LIBC_NAMESPACE_DECL { -[[gnu::naked]] LLVM_LIBC_FUNCTION(int, setjmp, (__jmp_buf * buf)) { STORE(ra, buf->__pc); STORE(s0, buf->__regs[0]); ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100195 Backport ca076f7a63f6a80e2e38315ec462be354b196b8d Requested by: @MaskRay >From 772a44ca77676be636cd7027c8703e8467bc38ad Mon Sep 17 00:00:00 2001 From: Wesley Wiser Date: Tue, 23 Jul 2024 11:43:30 -0500 Subject: [PATCH] [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) Rebase of #84114. I've only included the core changes to frame layout calculation & CFI generation which sidesteps the regressions found after merging #84114. Since these changes are a necessary precursor to the overall fix and are themselves slightly beneficial as CFI is now generated correctly, I think it is reasonable to merge this first step. --- For very large stack frames, the offset from the stack pointer to a local can be more than 2^31 which overflows various `int` offsets in the frame lowering code. This patch updates the frame lowering code to calculate the offsets as 64-bit values and fixes CFI to use the corrected sizes. After this patch, additional work is needed to fix offset truncations in each target's codegen. (cherry picked from commit ca076f7a63f6a80e2e38315ec462be354b196b8d) --- llvm/include/llvm/CodeGen/MachineFrameInfo.h | 14 +++--- .../llvm/CodeGen/TargetFrameLowering.h| 4 +- llvm/include/llvm/MC/MCAsmBackend.h | 2 +- llvm/include/llvm/MC/MCDwarf.h| 44 +-- llvm/lib/CodeGen/CFIInstrInserter.cpp | 10 ++--- llvm/lib/CodeGen/MachineFrameInfo.cpp | 2 +- llvm/lib/CodeGen/PrologEpilogInserter.cpp | 4 +- llvm/lib/MC/MCDwarf.cpp | 6 +-- .../MCTargetDesc/AArch64AsmBackend.cpp| 8 ++-- llvm/lib/Target/ARM/ARMFrameLowering.cpp | 4 +- .../Target/ARM/MCTargetDesc/ARMAsmBackend.cpp | 2 +- .../ARM/MCTargetDesc/ARMAsmBackendDarwin.h| 2 +- .../Target/Hexagon/HexagonFrameLowering.cpp | 4 +- .../lib/Target/MSP430/MSP430FrameLowering.cpp | 2 +- .../Target/X86/MCTargetDesc/X86AsmBackend.cpp | 12 ++--- llvm/lib/Target/X86/X86FrameLowering.cpp | 4 +- llvm/test/CodeGen/PowerPC/huge-frame-size.ll | 2 +- llvm/test/CodeGen/RISCV/pr88365.ll| 2 +- llvm/test/CodeGen/X86/huge-stack.ll | 2 +- 19 files changed, 65 insertions(+), 65 deletions(-) diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 466fed7fb3a29..213b7ec6b3fbf 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -251,7 +251,7 @@ class MachineFrameInfo { /// targets, this value is only used when generating debug info (via /// TargetRegisterInfo::getFrameIndexReference); when generating code, the /// corresponding adjustments are performed directly. - int OffsetAdjustment = 0; + int64_t OffsetAdjustment = 0; /// The prolog/epilog code inserter may process objects that require greater /// alignment than the default alignment the target provides. @@ -280,7 +280,7 @@ class MachineFrameInfo { /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo /// class). This information is important for frame pointer elimination. /// It is only valid during and after prolog/epilog code insertion. - unsigned MaxCallFrameSize = ~0u; + uint64_t MaxCallFrameSize = ~UINT64_C(0); /// The number of bytes of callee saved registers that the target wants to /// report for the current function in the CodeView S_FRAMEPROC record. @@ -593,10 +593,10 @@ class MachineFrameInfo { uint64_t estimateStackSize(const MachineFunction &MF) const; /// Return the correction for frame offsets. - int getOffsetAdjustment() const { return OffsetAdjustment; } + int64_t getOffsetAdjustment() const { return OffsetAdjustment; } /// Set the correction for frame offsets. - void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; } + void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; } /// Return the alignment in bytes that this function must be aligned to, /// which is greater than the default stack alignment provided by the target. @@ -663,7 +663,7 @@ class MachineFrameInfo { /// CallFrameSetup/Destroy pseudo instructions are used by the target, and /// then only during or after prolog/epilog code insertion. /// - unsigned getMaxCallFrameSize() const { + uint64_t getMaxCallFrameSize() const { // TODO: Enable this assert when targets are fixed. //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed yet"); if (!isMaxCallFrameSizeComputed()) @@ -671,9 +671,9 @@ class MachineFrameInfo { return MaxCallFrameSize; } bool isMaxCallFrameSizeComputed() const { -return MaxCallFrameSize != ~0u; +return MaxCallFrameSize != ~UINT64_C(0); } - void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; } + void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; }
[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100195 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)
llvmbot wrote: @wesleywiser What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100195 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)
llvmbot wrote: @llvm/pr-subscribers-backend-msp430 @llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-debuginfo Author: None (llvmbot) Changes Backport ca076f7a63f6a80e2e38315ec462be354b196b8d Requested by: @MaskRay --- Patch is 27.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100195.diff 19 Files Affected: - (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7-7) - (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+2-2) - (modified) llvm/include/llvm/MC/MCAsmBackend.h (+1-1) - (modified) llvm/include/llvm/MC/MCDwarf.h (+22-22) - (modified) llvm/lib/CodeGen/CFIInstrInserter.cpp (+5-5) - (modified) llvm/lib/CodeGen/MachineFrameInfo.cpp (+1-1) - (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+2-2) - (modified) llvm/lib/MC/MCDwarf.cpp (+3-3) - (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp (+4-4) - (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+2-2) - (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+1-1) - (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h (+1-1) - (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+2-2) - (modified) llvm/lib/Target/MSP430/MSP430FrameLowering.cpp (+1-1) - (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+6-6) - (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+2-2) - (modified) llvm/test/CodeGen/PowerPC/huge-frame-size.ll (+1-1) - (modified) llvm/test/CodeGen/RISCV/pr88365.ll (+1-1) - (modified) llvm/test/CodeGen/X86/huge-stack.ll (+1-1) ``diff diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 466fed7fb3a29..213b7ec6b3fbf 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -251,7 +251,7 @@ class MachineFrameInfo { /// targets, this value is only used when generating debug info (via /// TargetRegisterInfo::getFrameIndexReference); when generating code, the /// corresponding adjustments are performed directly. - int OffsetAdjustment = 0; + int64_t OffsetAdjustment = 0; /// The prolog/epilog code inserter may process objects that require greater /// alignment than the default alignment the target provides. @@ -280,7 +280,7 @@ class MachineFrameInfo { /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo /// class). This information is important for frame pointer elimination. /// It is only valid during and after prolog/epilog code insertion. - unsigned MaxCallFrameSize = ~0u; + uint64_t MaxCallFrameSize = ~UINT64_C(0); /// The number of bytes of callee saved registers that the target wants to /// report for the current function in the CodeView S_FRAMEPROC record. @@ -593,10 +593,10 @@ class MachineFrameInfo { uint64_t estimateStackSize(const MachineFunction &MF) const; /// Return the correction for frame offsets. - int getOffsetAdjustment() const { return OffsetAdjustment; } + int64_t getOffsetAdjustment() const { return OffsetAdjustment; } /// Set the correction for frame offsets. - void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; } + void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; } /// Return the alignment in bytes that this function must be aligned to, /// which is greater than the default stack alignment provided by the target. @@ -663,7 +663,7 @@ class MachineFrameInfo { /// CallFrameSetup/Destroy pseudo instructions are used by the target, and /// then only during or after prolog/epilog code insertion. /// - unsigned getMaxCallFrameSize() const { + uint64_t getMaxCallFrameSize() const { // TODO: Enable this assert when targets are fixed. //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed yet"); if (!isMaxCallFrameSizeComputed()) @@ -671,9 +671,9 @@ class MachineFrameInfo { return MaxCallFrameSize; } bool isMaxCallFrameSizeComputed() const { -return MaxCallFrameSize != ~0u; +return MaxCallFrameSize != ~UINT64_C(0); } - void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; } + void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; } /// Returns how many bytes of callee-saved registers the target pushed in the /// prologue. Only used for debug info. diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h index 0b9cacecc7cbe..72978b2f746d7 100644 --- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h +++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h @@ -51,7 +51,7 @@ class TargetFrameLowering { // Maps a callee saved register to a stack slot with a fixed offset. struct SpillSlot { unsigned Reg; -int Offset; // Offset relative to stack pointer on function entry. +int64_t Offset; // Offset relative to stack pointer on function entry. }; struct DwarfFrameBase { @@ -66,7 +
[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)
llvmbot wrote: @llvm/pr-subscribers-backend-hexagon Author: None (llvmbot) Changes Backport ca076f7a63f6a80e2e38315ec462be354b196b8d Requested by: @MaskRay --- Patch is 27.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/100195.diff 19 Files Affected: - (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7-7) - (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+2-2) - (modified) llvm/include/llvm/MC/MCAsmBackend.h (+1-1) - (modified) llvm/include/llvm/MC/MCDwarf.h (+22-22) - (modified) llvm/lib/CodeGen/CFIInstrInserter.cpp (+5-5) - (modified) llvm/lib/CodeGen/MachineFrameInfo.cpp (+1-1) - (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+2-2) - (modified) llvm/lib/MC/MCDwarf.cpp (+3-3) - (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp (+4-4) - (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+2-2) - (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+1-1) - (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h (+1-1) - (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+2-2) - (modified) llvm/lib/Target/MSP430/MSP430FrameLowering.cpp (+1-1) - (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+6-6) - (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+2-2) - (modified) llvm/test/CodeGen/PowerPC/huge-frame-size.ll (+1-1) - (modified) llvm/test/CodeGen/RISCV/pr88365.ll (+1-1) - (modified) llvm/test/CodeGen/X86/huge-stack.ll (+1-1) ``diff diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h index 466fed7fb3a29..213b7ec6b3fbf 100644 --- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h +++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h @@ -251,7 +251,7 @@ class MachineFrameInfo { /// targets, this value is only used when generating debug info (via /// TargetRegisterInfo::getFrameIndexReference); when generating code, the /// corresponding adjustments are performed directly. - int OffsetAdjustment = 0; + int64_t OffsetAdjustment = 0; /// The prolog/epilog code inserter may process objects that require greater /// alignment than the default alignment the target provides. @@ -280,7 +280,7 @@ class MachineFrameInfo { /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo /// class). This information is important for frame pointer elimination. /// It is only valid during and after prolog/epilog code insertion. - unsigned MaxCallFrameSize = ~0u; + uint64_t MaxCallFrameSize = ~UINT64_C(0); /// The number of bytes of callee saved registers that the target wants to /// report for the current function in the CodeView S_FRAMEPROC record. @@ -593,10 +593,10 @@ class MachineFrameInfo { uint64_t estimateStackSize(const MachineFunction &MF) const; /// Return the correction for frame offsets. - int getOffsetAdjustment() const { return OffsetAdjustment; } + int64_t getOffsetAdjustment() const { return OffsetAdjustment; } /// Set the correction for frame offsets. - void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; } + void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; } /// Return the alignment in bytes that this function must be aligned to, /// which is greater than the default stack alignment provided by the target. @@ -663,7 +663,7 @@ class MachineFrameInfo { /// CallFrameSetup/Destroy pseudo instructions are used by the target, and /// then only during or after prolog/epilog code insertion. /// - unsigned getMaxCallFrameSize() const { + uint64_t getMaxCallFrameSize() const { // TODO: Enable this assert when targets are fixed. //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed yet"); if (!isMaxCallFrameSizeComputed()) @@ -671,9 +671,9 @@ class MachineFrameInfo { return MaxCallFrameSize; } bool isMaxCallFrameSizeComputed() const { -return MaxCallFrameSize != ~0u; +return MaxCallFrameSize != ~UINT64_C(0); } - void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; } + void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; } /// Returns how many bytes of callee-saved registers the target pushed in the /// prologue. Only used for debug info. diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h b/llvm/include/llvm/CodeGen/TargetFrameLowering.h index 0b9cacecc7cbe..72978b2f746d7 100644 --- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h +++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h @@ -51,7 +51,7 @@ class TargetFrameLowering { // Maps a callee saved register to a stack slot with a fixed offset. struct SpillSlot { unsigned Reg; -int Offset; // Offset relative to stack pointer on function entry. +int64_t Offset; // Offset relative to stack pointer on function entry. }; struct DwarfFrameBase { @@ -66,7 +66,7 @@ class TargetFrameLowering { // Used with FrameBa
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100207 Backport 63b382bbde5994e8f2cec75883320e3ad9fd618f Requested by: @azhan92 >From 33bfe961c6c6bd8601a68d0d6b58cfce2310518c Mon Sep 17 00:00:00 2001 From: azhan92 Date: Tue, 23 Jul 2024 09:51:13 -0400 Subject: [PATCH] [PowerPC] Add builtin_cpu_is P11 support (#99550) This PR adds support for __builtin_cpu_is ("power11") (cherry picked from commit 63b382bbde5994e8f2cec75883320e3ad9fd618f) --- clang/test/CodeGen/aix-builtin-cpu-is.c | 4 ++ clang/test/CodeGen/builtin-cpu-supports.c | 72 --- .../llvm/TargetParser/PPCTargetParser.def | 3 + 3 files changed, 69 insertions(+), 10 deletions(-) diff --git a/clang/test/CodeGen/aix-builtin-cpu-is.c b/clang/test/CodeGen/aix-builtin-cpu-is.c index e17cf7353511a..04644dd7020e0 100644 --- a/clang/test/CodeGen/aix-builtin-cpu-is.c +++ b/clang/test/CodeGen/aix-builtin-cpu-is.c @@ -50,6 +50,10 @@ // RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | FileCheck %s -DVALUE=262144 \ // RUN: --check-prefix=CHECKOP +// RUN: echo "int main() { return __builtin_cpu_is(\"power11\");}" > %t.c +// RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | FileCheck %s -DVALUE=524288 \ +// RUN: --check-prefix=CHECKOP + // CHECK: define i32 @main() #0 { // CHECK-NEXT: entry: // CHECK-NEXT: %retval = alloca i32, align 4 diff --git a/clang/test/CodeGen/builtin-cpu-supports.c b/clang/test/CodeGen/builtin-cpu-supports.c index 88eb7b0fa786e..f960040ab094b 100644 --- a/clang/test/CodeGen/builtin-cpu-supports.c +++ b/clang/test/CodeGen/builtin-cpu-supports.c @@ -129,25 +129,69 @@ int v4() { return __builtin_cpu_supports("x86-64-v4"); } // CHECK-PPC: if.else3: // CHECK-PPC-NEXT:[[CPU_IS:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) // CHECK-PPC-NEXT:[[TMP6:%.*]] = icmp eq i32 [[CPU_IS]], 39 -// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label [[IF_END:%.*]] +// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label [[IF_ELSE5:%.*]] // CHECK-PPC: if.then4: // CHECK-PPC-NEXT:[[TMP7:%.*]] = load i32, ptr [[A_ADDR]], align 4 // CHECK-PPC-NEXT:[[TMP8:%.*]] = load i32, ptr [[A_ADDR]], align 4 // CHECK-PPC-NEXT:[[ADD:%.*]] = add nsw i32 [[TMP7]], [[TMP8]] // CHECK-PPC-NEXT:store i32 [[ADD]], ptr [[RETVAL]], align 4 // CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else5: +// CHECK-PPC-NEXT:[[CPU_IS6:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP9:%.*]] = icmp eq i32 [[CPU_IS6]], 45 +// CHECK-PPC-NEXT:br i1 [[TMP9]], label [[IF_THEN7:%.*]], label [[IF_ELSE9:%.*]] +// CHECK-PPC: if.then7: +// CHECK-PPC-NEXT:[[TMP10:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[ADD8:%.*]] = add nsw i32 [[TMP10]], 3 +// CHECK-PPC-NEXT:store i32 [[ADD8]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else9: +// CHECK-PPC-NEXT:[[CPU_IS10:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP11:%.*]] = icmp eq i32 [[CPU_IS10]], 46 +// CHECK-PPC-NEXT:br i1 [[TMP11]], label [[IF_THEN11:%.*]], label [[IF_ELSE13:%.*]] +// CHECK-PPC: if.then11: +// CHECK-PPC-NEXT:[[TMP12:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[SUB12:%.*]] = sub nsw i32 [[TMP12]], 3 +// CHECK-PPC-NEXT:store i32 [[SUB12]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else13: +// CHECK-PPC-NEXT:[[CPU_IS14:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP13:%.*]] = icmp eq i32 [[CPU_IS14]], 47 +// CHECK-PPC-NEXT:br i1 [[TMP13]], label [[IF_THEN15:%.*]], label [[IF_ELSE17:%.*]] +// CHECK-PPC: if.then15: +// CHECK-PPC-NEXT:[[TMP14:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[ADD16:%.*]] = add nsw i32 [[TMP14]], 7 +// CHECK-PPC-NEXT:store i32 [[ADD16]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else17: +// CHECK-PPC-NEXT:[[CPU_IS18:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP15:%.*]] = icmp eq i32 [[CPU_IS18]], 48 +// CHECK-PPC-NEXT:br i1 [[TMP15]], label [[IF_THEN19:%.*]], label [[IF_END:%.*]] +// CHECK-PPC: if.then19: +// CHECK-PPC-NEXT:[[TMP16:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[SUB20:%.*]] = sub nsw i32 [[TMP16]], 7 +// CHECK-PPC-NEXT:store i32 [[SUB20]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] // CHECK-PPC: if.end: -// CHECK-PPC-NEXT:br label [[IF_END5:%.*]] -// CHECK-PPC: if.end5: -// CHECK-PPC-NEXT:br label [[IF_END6:%.*]] -// CHECK-PPC: if.end6: -// CHECK-PPC-NEXT:[[TMP9:%.*]] = load i32, ptr [[A_ADDR]], align 4 -// CHECK-PPC-NEXT:[[ADD7:%.*]] = add nsw i32 [[TMP9]], 5 -// CHECK-PPC-NEXT:store i32
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100207 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)
llvmbot wrote: @daltenty What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100207 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)
llvmbot wrote: @llvm/pr-subscribers-clang Author: None (llvmbot) Changes Backport 63b382bbde5994e8f2cec75883320e3ad9fd618f Requested by: @azhan92 --- Full diff: https://github.com/llvm/llvm-project/pull/100207.diff 3 Files Affected: - (modified) clang/test/CodeGen/aix-builtin-cpu-is.c (+4) - (modified) clang/test/CodeGen/builtin-cpu-supports.c (+62-10) - (modified) llvm/include/llvm/TargetParser/PPCTargetParser.def (+3) ``diff diff --git a/clang/test/CodeGen/aix-builtin-cpu-is.c b/clang/test/CodeGen/aix-builtin-cpu-is.c index e17cf7353511a..04644dd7020e0 100644 --- a/clang/test/CodeGen/aix-builtin-cpu-is.c +++ b/clang/test/CodeGen/aix-builtin-cpu-is.c @@ -50,6 +50,10 @@ // RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | FileCheck %s -DVALUE=262144 \ // RUN: --check-prefix=CHECKOP +// RUN: echo "int main() { return __builtin_cpu_is(\"power11\");}" > %t.c +// RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | FileCheck %s -DVALUE=524288 \ +// RUN: --check-prefix=CHECKOP + // CHECK: define i32 @main() #0 { // CHECK-NEXT: entry: // CHECK-NEXT: %retval = alloca i32, align 4 diff --git a/clang/test/CodeGen/builtin-cpu-supports.c b/clang/test/CodeGen/builtin-cpu-supports.c index 88eb7b0fa786e..f960040ab094b 100644 --- a/clang/test/CodeGen/builtin-cpu-supports.c +++ b/clang/test/CodeGen/builtin-cpu-supports.c @@ -129,25 +129,69 @@ int v4() { return __builtin_cpu_supports("x86-64-v4"); } // CHECK-PPC: if.else3: // CHECK-PPC-NEXT:[[CPU_IS:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) // CHECK-PPC-NEXT:[[TMP6:%.*]] = icmp eq i32 [[CPU_IS]], 39 -// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label [[IF_END:%.*]] +// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label [[IF_ELSE5:%.*]] // CHECK-PPC: if.then4: // CHECK-PPC-NEXT:[[TMP7:%.*]] = load i32, ptr [[A_ADDR]], align 4 // CHECK-PPC-NEXT:[[TMP8:%.*]] = load i32, ptr [[A_ADDR]], align 4 // CHECK-PPC-NEXT:[[ADD:%.*]] = add nsw i32 [[TMP7]], [[TMP8]] // CHECK-PPC-NEXT:store i32 [[ADD]], ptr [[RETVAL]], align 4 // CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else5: +// CHECK-PPC-NEXT:[[CPU_IS6:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP9:%.*]] = icmp eq i32 [[CPU_IS6]], 45 +// CHECK-PPC-NEXT:br i1 [[TMP9]], label [[IF_THEN7:%.*]], label [[IF_ELSE9:%.*]] +// CHECK-PPC: if.then7: +// CHECK-PPC-NEXT:[[TMP10:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[ADD8:%.*]] = add nsw i32 [[TMP10]], 3 +// CHECK-PPC-NEXT:store i32 [[ADD8]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else9: +// CHECK-PPC-NEXT:[[CPU_IS10:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP11:%.*]] = icmp eq i32 [[CPU_IS10]], 46 +// CHECK-PPC-NEXT:br i1 [[TMP11]], label [[IF_THEN11:%.*]], label [[IF_ELSE13:%.*]] +// CHECK-PPC: if.then11: +// CHECK-PPC-NEXT:[[TMP12:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[SUB12:%.*]] = sub nsw i32 [[TMP12]], 3 +// CHECK-PPC-NEXT:store i32 [[SUB12]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else13: +// CHECK-PPC-NEXT:[[CPU_IS14:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP13:%.*]] = icmp eq i32 [[CPU_IS14]], 47 +// CHECK-PPC-NEXT:br i1 [[TMP13]], label [[IF_THEN15:%.*]], label [[IF_ELSE17:%.*]] +// CHECK-PPC: if.then15: +// CHECK-PPC-NEXT:[[TMP14:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[ADD16:%.*]] = add nsw i32 [[TMP14]], 7 +// CHECK-PPC-NEXT:store i32 [[ADD16]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] +// CHECK-PPC: if.else17: +// CHECK-PPC-NEXT:[[CPU_IS18:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3) +// CHECK-PPC-NEXT:[[TMP15:%.*]] = icmp eq i32 [[CPU_IS18]], 48 +// CHECK-PPC-NEXT:br i1 [[TMP15]], label [[IF_THEN19:%.*]], label [[IF_END:%.*]] +// CHECK-PPC: if.then19: +// CHECK-PPC-NEXT:[[TMP16:%.*]] = load i32, ptr [[A_ADDR]], align 4 +// CHECK-PPC-NEXT:[[SUB20:%.*]] = sub nsw i32 [[TMP16]], 7 +// CHECK-PPC-NEXT:store i32 [[SUB20]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[RETURN]] // CHECK-PPC: if.end: -// CHECK-PPC-NEXT:br label [[IF_END5:%.*]] -// CHECK-PPC: if.end5: -// CHECK-PPC-NEXT:br label [[IF_END6:%.*]] -// CHECK-PPC: if.end6: -// CHECK-PPC-NEXT:[[TMP9:%.*]] = load i32, ptr [[A_ADDR]], align 4 -// CHECK-PPC-NEXT:[[ADD7:%.*]] = add nsw i32 [[TMP9]], 5 -// CHECK-PPC-NEXT:store i32 [[ADD7]], ptr [[RETVAL]], align 4 +// CHECK-PPC-NEXT:br label [[IF_END21:%.*]] +// CHECK-PPC: if.end21: +// CHECK-PPC-NEXT:br label [[IF_END22:%.*]] +// CHECK-PPC: if.end22: +// CHECK-PPC-NEXT:br label [[IF_END23:%.*]] +// CHECK-PPC:
[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)
https://github.com/azhan92 approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/100207 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)
https://github.com/shawbyoung updated https://github.com/llvm/llvm-project/pull/99891 >From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001 From: shawbyoung Date: Mon, 22 Jul 2024 11:56:23 -0700 Subject: [PATCH 1/4] Changed assignment of profiles with pseudo probe index Created using spr 1.3.4 --- bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++ .../X86/match-blocks-with-pseudo-probes.test | 25 ++ 2 files changed, 78 insertions(+), 32 deletions(-) diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp b/bolt/lib/Profile/StaleProfileMatching.cpp index 4105f626fb5b6..c135ee5ff4837 100644 --- a/bolt/lib/Profile/StaleProfileMatching.cpp +++ b/bolt/lib/Profile/StaleProfileMatching.cpp @@ -195,11 +195,15 @@ class StaleMatcher { void init(const std::vector &Blocks, const std::vector &Hashes, const std::vector &CallHashes, -std::optional YamlBFGUID) { +const std::unordered_map> +IndexToBinaryPseudoProbes, +const std::unordered_map +BinaryPseudoProbeToBlock, +const uint64_t YamlBFGUID) { assert(Blocks.size() == Hashes.size() && Hashes.size() == CallHashes.size() && "incorrect matcher initialization"); - for (size_t I = 0; I < Blocks.size(); I++) { FlowBlock *Block = Blocks[I]; uint16_t OpHash = Hashes[I].OpcodeHash; @@ -209,6 +213,8 @@ class StaleMatcher { std::make_pair(Hashes[I], Block)); this->Blocks.push_back(Block); } +this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes; +this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock; this->YamlBFGUID = YamlBFGUID; } @@ -234,10 +240,14 @@ class StaleMatcher { using HashBlockPairType = std::pair; std::unordered_map> OpHashToBlocks; std::unordered_map> CallHashToBlocks; - std::vector Blocks; + std::unordered_map> + IndexToBinaryPseudoProbes; + std::unordered_map + BinaryPseudoProbeToBlock; + std::vector Blocks; // If the pseudo probe checksums of the profiled and binary functions are // equal, then the YamlBF's GUID is defined and used to match blocks. - std::optional YamlBFGUID; + uint64_t YamlBFGUID; // Uses OpcodeHash to find the most similar block for a given hash. const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const { @@ -284,7 +294,7 @@ class StaleMatcher { // Searches for the pseudo probe attached to the matched function's block, // ignoring pseudo probes attached to function calls and inlined functions' // blocks. -outs() << "match with pseudo probes\n"; +std::vector BlockPseudoProbes; for (const auto &PseudoProbe : PseudoProbes) { // Ensures that pseudo probe information belongs to the appropriate // function and not an inlined function. @@ -293,11 +303,30 @@ class StaleMatcher { // Skips pseudo probes attached to function calls. if (PseudoProbe.Type != static_cast(PseudoProbeType::Block)) continue; - assert(PseudoProbe.Index < Blocks.size() && - "pseudo probe index out of range"); - return Blocks[PseudoProbe.Index]; + + BlockPseudoProbes.push_back(&PseudoProbe); } -return nullptr; + +// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo +// probe and binary pseudo probe. +if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1) + return nullptr; + +uint64_t Index = BlockPseudoProbes[0]->Index; +assert(Index < Blocks.size() && "Invalid pseudo probe index"); + +auto It = IndexToBinaryPseudoProbes.find(Index); +assert(It != IndexToBinaryPseudoProbes.end() && + "All blocks should have a pseudo probe"); +if (It->second.size() > 1) + return nullptr; + +const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0]; +auto BinaryPseudoProbeIt = BinaryPseudoProbeToBlock.find(BinaryPseudoProbe); +assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() && + "All binary pseudo probes should belong a binary basic block"); + +return BinaryPseudoProbeIt->second; } }; @@ -491,6 +520,11 @@ size_t matchWeightsByHashes( std::vector CallHashes; std::vector Blocks; std::vector BlendedHashes; + std::unordered_map> + IndexToBinaryPseudoProbes; + std::unordered_map + BinaryPseudoProbeToBlock; + const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder(); for (uint64_t I = 0; I < BlockOrder.size(); I++) { const BinaryBasicBlock *BB = BlockOrder[I]; assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock"); @@ -510,9 +544,27 @@ size_t matchWeightsByHashes( Blocks.push_back(&Func.Blocks[I + 1]); BlendedBlockHash BlendedHash(BB->getHash()); BlendedHashes.push_back(BlendedHash); +if (PseudoProbeDecoder) { + const AddressProbesMap &ProbeMap = + PseudoProbeDecoder->getAd
[llvm-branch-commits] [BOLT] Support more than two jump table parents (PR #99988)
https://github.com/dcci approved this pull request. https://github.com/llvm/llvm-project/pull/99988 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100215 Backport 8be1325cb1903797ba3dce67087e395f9e080576 Requested by: @asl >From c8d9662b0542cc99a88acc35762dca7f0d09a22b Mon Sep 17 00:00:00 2001 From: Oliver Hunt Date: Tue, 23 Jul 2024 14:18:53 -0700 Subject: [PATCH] [clang][test] Add function type discrimination tests to static destructor tests (#99604) I accidentally did not include tests for the setting up runtime calls when compiling with -fptrauth-function-pointer-type-discrimination (cherry picked from commit 8be1325cb1903797ba3dce67087e395f9e080576) --- .../CodeGenCXX/ptrauth-static-destructors.cpp | 37 --- 1 file changed, 31 insertions(+), 6 deletions(-) diff --git a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp index 1240f26d329da..634450bf62ea9 100644 --- a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp +++ b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp @@ -2,13 +2,27 @@ // RUN: | FileCheck %s --check-prefix=CXAATEXIT // RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ -// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,DARWIN +// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ATEXIT_DARWIN // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ // RUN: | FileCheck %s --check-prefix=CXAATEXIT // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ -// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ELF +// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ATEXIT_ELF + +// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s \ +// RUN: -fptrauth-function-pointer-type-discrimination -o - | FileCheck %s --check-prefix=CXAATEXIT_DISC + +// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ +// RUN: -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \ +// RUN: | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_DARWIN + +// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s \ +// RUN: -fptrauth-function-pointer-type-discrimination -o - | FileCheck %s --check-prefix=CXAATEXIT_DISC + +// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ +// RUN: -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \ +// RUN: | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_ELF class Foo { public: @@ -21,11 +35,22 @@ Foo global; // CXAATEXIT: define internal void @__cxx_global_var_init() // CXAATEXIT: call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0), ptr @global, ptr @__dso_handle) +// CXAATEXIT_DISC: define internal void @__cxx_global_var_init() +// CXAATEXIT_DISC: call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0, i64 10942), ptr @global, ptr @__dso_handle) // ATEXIT: define internal void @__cxx_global_var_init() // ATEXIT: %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 0)) -// DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { -// ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { -// DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) -// ELF: call void @_ZN3FooD1Ev(ptr @global) +// ATEXIT_DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { +// ATEXIT_ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { +// ATEXIT_DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) +// ATEXIT_ELF: call void @_ZN3FooD1Ev(ptr @global) + +// ATEXIT_DISC: define internal void @__cxx_global_var_init() +// ATEXIT_DISC: %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 0, i64 10942)) + + +// ATEXIT_DISC_DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { +// ATEXIT_DISC_ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { +// ATEXIT_DISC_DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) +// ATEXIT_DISC_ELF: call void @_ZN3FooD1Ev(ptr @global) ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100215 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)
llvmbot wrote: @kovdan01 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100215 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)
llvmbot wrote: @llvm/pr-subscribers-clang Author: None (llvmbot) Changes Backport 8be1325cb1903797ba3dce67087e395f9e080576 Requested by: @asl --- Full diff: https://github.com/llvm/llvm-project/pull/100215.diff 1 Files Affected: - (modified) clang/test/CodeGenCXX/ptrauth-static-destructors.cpp (+31-6) ``diff diff --git a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp index 1240f26d329da..634450bf62ea9 100644 --- a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp +++ b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp @@ -2,13 +2,27 @@ // RUN: | FileCheck %s --check-prefix=CXAATEXIT // RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ -// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,DARWIN +// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ATEXIT_DARWIN // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ // RUN: | FileCheck %s --check-prefix=CXAATEXIT // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ -// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ELF +// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ATEXIT_ELF + +// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s \ +// RUN: -fptrauth-function-pointer-type-discrimination -o - | FileCheck %s --check-prefix=CXAATEXIT_DISC + +// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ +// RUN: -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \ +// RUN: | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_DARWIN + +// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s \ +// RUN: -fptrauth-function-pointer-type-discrimination -o - | FileCheck %s --check-prefix=CXAATEXIT_DISC + +// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm -std=c++11 %s -o - \ +// RUN: -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \ +// RUN: | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_ELF class Foo { public: @@ -21,11 +35,22 @@ Foo global; // CXAATEXIT: define internal void @__cxx_global_var_init() // CXAATEXIT: call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0), ptr @global, ptr @__dso_handle) +// CXAATEXIT_DISC: define internal void @__cxx_global_var_init() +// CXAATEXIT_DISC: call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0, i64 10942), ptr @global, ptr @__dso_handle) // ATEXIT: define internal void @__cxx_global_var_init() // ATEXIT: %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 0)) -// DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { -// ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { -// DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) -// ELF: call void @_ZN3FooD1Ev(ptr @global) +// ATEXIT_DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { +// ATEXIT_ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { +// ATEXIT_DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) +// ATEXIT_ELF: call void @_ZN3FooD1Ev(ptr @global) + +// ATEXIT_DISC: define internal void @__cxx_global_var_init() +// ATEXIT_DISC: %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 0, i64 10942)) + + +// ATEXIT_DISC_DARWIN: define internal void @__dtor_global() {{.*}} section "__TEXT,__StaticInit,regular,pure_instructions" { +// ATEXIT_DISC_ELF:define internal void @__dtor_global() {{.*}} section ".text.startup" { +// ATEXIT_DISC_DARWIN: %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global) +// ATEXIT_DISC_ELF: call void @_ZN3FooD1Ev(ptr @global) `` https://github.com/llvm/llvm-project/pull/100215 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/100216 Backport 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43 Requested by: @jhuber6 >From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 23 Jul 2024 14:41:57 -0500 Subject: [PATCH] [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) Summary: This was not forwarded properly as it would try to pass it to `nvlink`. Fixes https://github.com/llvm/llvm-project/issues/100168 (cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43) --- clang/lib/Driver/ToolChains/Cuda.cpp | 4 clang/test/Driver/linker-wrapper-passes.c | 10 +++--- clang/test/Driver/nvlink-wrapper.c | 7 +++ clang/tools/clang-nvlink-wrapper/NVLinkOpts.td | 4 ++-- 4 files changed, 16 insertions(+), 9 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 59453c484ae4f..61d12b10dfb62 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Args.MakeArgString( "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ))); + if (Args.hasArg(options::OPT_cuda_path_EQ)) +CmdArgs.push_back(Args.MakeArgString( +"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ))); + // Add paths specified in LIBRARY_PATH environment variable as -L options. addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH"); diff --git a/clang/test/Driver/linker-wrapper-passes.c b/clang/test/Driver/linker-wrapper-passes.c index aadcf472e9b63..8c337ff906d17 100644 --- a/clang/test/Driver/linker-wrapper-passes.c +++ b/clang/test/Driver/linker-wrapper-passes.c @@ -1,9 +1,5 @@ // Check various clang-linker-wrapper pass options after -offload-opt. -// REQUIRES: llvm-plugins, llvm-examples -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target - // Setup. // RUN: mkdir -p %t // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \ @@ -23,14 +19,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=OUT %s // Check plugin, -p, and remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-p="function(goodbye),module(inline)" \ @@ -43,7 +39,7 @@ // RUN: -check-prefixes=YML %s // Check handling of bad plugin. -// RUN: not clang-linker-wrapper \ +// RUN: not clang-linker-wrapper --dry-run \ // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c index fdda93f1f9cdc..318315ddaca34 100644 --- a/clang/test/Driver/nvlink-wrapper.c +++ b/clang/test/Driver/nvlink-wrapper.c @@ -63,3 +63,10 @@ int baz() { return y + x; } // RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin + +// +// Check that we don't forward some arguments. +// +// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \ +// RUN: -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s --check-prefix=PATH +// PATH-NOT: --cuda-path=/opt/cuda diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td index e84b530f2787d..8c80a51b12a44 100644 --- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td +++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td @@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose information">; def version : Flag<["--"], "version">, HelpText<"Display the version number and exit">; -def cuda_path_EQ : Joined<["--"], "cuda-path=">, +def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the system CUDA path">; -def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, +def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the 'ptxas' path">; def o : JoinedOrSeparate<["-"],
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
llvmbot wrote: @Artem-B What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
llvmbot wrote: @llvm/pr-subscribers-clang-driver Author: None (llvmbot) Changes Backport 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43 Requested by: @jhuber6 --- Full diff: https://github.com/llvm/llvm-project/pull/100216.diff 4 Files Affected: - (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+4) - (modified) clang/test/Driver/linker-wrapper-passes.c (+3-7) - (modified) clang/test/Driver/nvlink-wrapper.c (+7) - (modified) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+2-2) ``diff diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 59453c484ae4f..61d12b10dfb62 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Args.MakeArgString( "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ))); + if (Args.hasArg(options::OPT_cuda_path_EQ)) +CmdArgs.push_back(Args.MakeArgString( +"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ))); + // Add paths specified in LIBRARY_PATH environment variable as -L options. addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH"); diff --git a/clang/test/Driver/linker-wrapper-passes.c b/clang/test/Driver/linker-wrapper-passes.c index aadcf472e9b63..8c337ff906d17 100644 --- a/clang/test/Driver/linker-wrapper-passes.c +++ b/clang/test/Driver/linker-wrapper-passes.c @@ -1,9 +1,5 @@ // Check various clang-linker-wrapper pass options after -offload-opt. -// REQUIRES: llvm-plugins, llvm-examples -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target - // Setup. // RUN: mkdir -p %t // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \ @@ -23,14 +19,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=OUT %s // Check plugin, -p, and remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-p="function(goodbye),module(inline)" \ @@ -43,7 +39,7 @@ // RUN: -check-prefixes=YML %s // Check handling of bad plugin. -// RUN: not clang-linker-wrapper \ +// RUN: not clang-linker-wrapper --dry-run \ // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c index fdda93f1f9cdc..318315ddaca34 100644 --- a/clang/test/Driver/nvlink-wrapper.c +++ b/clang/test/Driver/nvlink-wrapper.c @@ -63,3 +63,10 @@ int baz() { return y + x; } // RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin + +// +// Check that we don't forward some arguments. +// +// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \ +// RUN: -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s --check-prefix=PATH +// PATH-NOT: --cuda-path=/opt/cuda diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td index e84b530f2787d..8c80a51b12a44 100644 --- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td +++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td @@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose information">; def version : Flag<["--"], "version">, HelpText<"Display the version number and exit">; -def cuda_path_EQ : Joined<["--"], "cuda-path=">, +def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the system CUDA path">; -def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, +def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the 'ptxas' path">; def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">, `` https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
@@ -1,9 +1,5 @@ // Check various clang-linker-wrapper pass options after -offload-opt. jhuber6 wrote: ```suggestion // REQUIRES: llvm-plugins, llvm-examples // REQUIRES: x86-registered-target // REQUIRES: amdgpu-registered-target ``` https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
https://github.com/jhuber6 edited https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/100216 >From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 23 Jul 2024 14:41:57 -0500 Subject: [PATCH 1/2] [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) Summary: This was not forwarded properly as it would try to pass it to `nvlink`. Fixes https://github.com/llvm/llvm-project/issues/100168 (cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43) --- clang/lib/Driver/ToolChains/Cuda.cpp | 4 clang/test/Driver/linker-wrapper-passes.c | 10 +++--- clang/test/Driver/nvlink-wrapper.c | 7 +++ clang/tools/clang-nvlink-wrapper/NVLinkOpts.td | 4 ++-- 4 files changed, 16 insertions(+), 9 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 59453c484ae4f..61d12b10dfb62 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Args.MakeArgString( "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ))); + if (Args.hasArg(options::OPT_cuda_path_EQ)) +CmdArgs.push_back(Args.MakeArgString( +"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ))); + // Add paths specified in LIBRARY_PATH environment variable as -L options. addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH"); diff --git a/clang/test/Driver/linker-wrapper-passes.c b/clang/test/Driver/linker-wrapper-passes.c index aadcf472e9b63..8c337ff906d17 100644 --- a/clang/test/Driver/linker-wrapper-passes.c +++ b/clang/test/Driver/linker-wrapper-passes.c @@ -1,9 +1,5 @@ // Check various clang-linker-wrapper pass options after -offload-opt. -// REQUIRES: llvm-plugins, llvm-examples -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target - // Setup. // RUN: mkdir -p %t // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \ @@ -23,14 +19,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=OUT %s // Check plugin, -p, and remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-p="function(goodbye),module(inline)" \ @@ -43,7 +39,7 @@ // RUN: -check-prefixes=YML %s // Check handling of bad plugin. -// RUN: not clang-linker-wrapper \ +// RUN: not clang-linker-wrapper --dry-run \ // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c index fdda93f1f9cdc..318315ddaca34 100644 --- a/clang/test/Driver/nvlink-wrapper.c +++ b/clang/test/Driver/nvlink-wrapper.c @@ -63,3 +63,10 @@ int baz() { return y + x; } // RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin + +// +// Check that we don't forward some arguments. +// +// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \ +// RUN: -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s --check-prefix=PATH +// PATH-NOT: --cuda-path=/opt/cuda diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td index e84b530f2787d..8c80a51b12a44 100644 --- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td +++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td @@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose information">; def version : Flag<["--"], "version">, HelpText<"Display the version number and exit">; -def cuda_path_EQ : Joined<["--"], "cuda-path=">, +def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the system CUDA path">; -def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, +def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the 'ptxas' path">; def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">, >From e9ac0f0e5916236cb091179cfa7befd081b01355
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
@@ -23,14 +22,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=OUT %s // Check plugin, -p, and remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ jhuber6 wrote: ```suggestion // RUN: clang-linker-wrapper -o a.out --embed-bitcode \ ``` https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
@@ -23,14 +22,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ jhuber6 wrote: ```suggestion // RUN: clang-linker-wrapper -o a.out --embed-bitcode \ ``` https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
@@ -43,7 +42,7 @@ // RUN: -check-prefixes=YML %s // Check handling of bad plugin. -// RUN: not clang-linker-wrapper \ +// RUN: not clang-linker-wrapper --dry-run \ jhuber6 wrote: ```suggestion // RUN: not clang-linker-wrapper \ ``` https://github.com/llvm/llvm-project/pull/100216 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)
https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/100216 >From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 23 Jul 2024 14:41:57 -0500 Subject: [PATCH 1/3] [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) Summary: This was not forwarded properly as it would try to pass it to `nvlink`. Fixes https://github.com/llvm/llvm-project/issues/100168 (cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43) --- clang/lib/Driver/ToolChains/Cuda.cpp | 4 clang/test/Driver/linker-wrapper-passes.c | 10 +++--- clang/test/Driver/nvlink-wrapper.c | 7 +++ clang/tools/clang-nvlink-wrapper/NVLinkOpts.td | 4 ++-- 4 files changed, 16 insertions(+), 9 deletions(-) diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp b/clang/lib/Driver/ToolChains/Cuda.cpp index 59453c484ae4f..61d12b10dfb62 100644 --- a/clang/lib/Driver/ToolChains/Cuda.cpp +++ b/clang/lib/Driver/ToolChains/Cuda.cpp @@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const JobAction &JA, CmdArgs.push_back(Args.MakeArgString( "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ))); + if (Args.hasArg(options::OPT_cuda_path_EQ)) +CmdArgs.push_back(Args.MakeArgString( +"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ))); + // Add paths specified in LIBRARY_PATH environment variable as -L options. addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH"); diff --git a/clang/test/Driver/linker-wrapper-passes.c b/clang/test/Driver/linker-wrapper-passes.c index aadcf472e9b63..8c337ff906d17 100644 --- a/clang/test/Driver/linker-wrapper-passes.c +++ b/clang/test/Driver/linker-wrapper-passes.c @@ -1,9 +1,5 @@ // Check various clang-linker-wrapper pass options after -offload-opt. -// REQUIRES: llvm-plugins, llvm-examples -// REQUIRES: x86-registered-target -// REQUIRES: amdgpu-registered-target - // Setup. // RUN: mkdir -p %t // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \ @@ -23,14 +19,14 @@ // RUN: %t/host-x86_64-unknown-linux-gnu.s // Check plugin, -passes, and no remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=OUT %s // Check plugin, -p, and remarks. -// RUN: clang-linker-wrapper -o a.out --embed-bitcode \ +// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \ // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \ // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \ // RUN: --offload-opt=-p="function(goodbye),module(inline)" \ @@ -43,7 +39,7 @@ // RUN: -check-prefixes=YML %s // Check handling of bad plugin. -// RUN: not clang-linker-wrapper \ +// RUN: not clang-linker-wrapper --dry-run \ // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \ // RUN: FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s diff --git a/clang/test/Driver/nvlink-wrapper.c b/clang/test/Driver/nvlink-wrapper.c index fdda93f1f9cdc..318315ddaca34 100644 --- a/clang/test/Driver/nvlink-wrapper.c +++ b/clang/test/Driver/nvlink-wrapper.c @@ -63,3 +63,10 @@ int baz() { return y + x; } // RUN: -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin {{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin + +// +// Check that we don't forward some arguments. +// +// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \ +// RUN: -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s --check-prefix=PATH +// PATH-NOT: --cuda-path=/opt/cuda diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td index e84b530f2787d..8c80a51b12a44 100644 --- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td +++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td @@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose information">; def version : Flag<["--"], "version">, HelpText<"Display the version number and exit">; -def cuda_path_EQ : Joined<["--"], "cuda-path=">, +def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the system CUDA path">; -def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, +def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>, MetaVarName<"">, HelpText<"Set the 'ptxas' path">; def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">, >From e9ac0f0e5916236cb091179cfa7befd081b01355