[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-23 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

### Merge activity

* **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96162).


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-23 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

### Merge activity

* **Jul 23, 4:02 AM EDT**: @cdevadas started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/96163).


https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 32ec3b4 - Revert "[llvm-cgdata] Remove `GENERATE_DRIVER` option (#100066)"

2024-07-23 Thread via llvm-branch-commits

Author: Petr Hosek
Date: 2024-07-23T01:18:55-07:00
New Revision: 32ec3b4f547f9af8cd2af736cd7c00843ef69a93

URL: 
https://github.com/llvm/llvm-project/commit/32ec3b4f547f9af8cd2af736cd7c00843ef69a93
DIFF: 
https://github.com/llvm/llvm-project/commit/32ec3b4f547f9af8cd2af736cd7c00843ef69a93.diff

LOG: Revert "[llvm-cgdata] Remove `GENERATE_DRIVER` option (#100066)"

This reverts commit 96d412135395a251f2931b8fca4dd8150aeed9ba.

Added: 


Modified: 
llvm/tools/llvm-cgdata/CMakeLists.txt

Removed: 




diff  --git a/llvm/tools/llvm-cgdata/CMakeLists.txt 
b/llvm/tools/llvm-cgdata/CMakeLists.txt
index 966384278b9ab..4f1f7ff635bc3 100644
--- a/llvm/tools/llvm-cgdata/CMakeLists.txt
+++ b/llvm/tools/llvm-cgdata/CMakeLists.txt
@@ -11,4 +11,5 @@ add_llvm_tool(llvm-cgdata
 
   DEPENDS
   intrinsics_gen
+  GENERATE_DRIVER
   )



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] c2dbaeb - Bump version to 19.1.0git

2024-07-23 Thread Tobias Hieta via llvm-branch-commits

Author: Tobias Hieta
Date: 2024-07-23T11:06:16+02:00
New Revision: c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629

URL: 
https://github.com/llvm/llvm-project/commit/c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629
DIFF: 
https://github.com/llvm/llvm-project/commit/c2dbaeb91a45aeb6d26f22efef318b5f5a0eb629.diff

LOG: Bump version to 19.1.0git

Added: 


Modified: 
cmake/Modules/LLVMVersion.cmake
libcxx/include/__config
llvm/utils/gn/secondary/llvm/version.gni
llvm/utils/lit/lit/__init__.py

Removed: 




diff  --git a/cmake/Modules/LLVMVersion.cmake b/cmake/Modules/LLVMVersion.cmake
index 5e28283fbc1c6..aea9b880180ab 100644
--- a/cmake/Modules/LLVMVersion.cmake
+++ b/cmake/Modules/LLVMVersion.cmake
@@ -4,7 +4,7 @@ if(NOT DEFINED LLVM_VERSION_MAJOR)
   set(LLVM_VERSION_MAJOR 19)
 endif()
 if(NOT DEFINED LLVM_VERSION_MINOR)
-  set(LLVM_VERSION_MINOR 0)
+  set(LLVM_VERSION_MINOR 1)
 endif()
 if(NOT DEFINED LLVM_VERSION_PATCH)
   set(LLVM_VERSION_PATCH 0)

diff  --git a/libcxx/include/__config b/libcxx/include/__config
index 108f700823cbf..661af5be3c225 100644
--- a/libcxx/include/__config
+++ b/libcxx/include/__config
@@ -27,7 +27,7 @@
 // _LIBCPP_VERSION represents the version of libc++, which matches the version 
of LLVM.
 // Given a LLVM release LLVM XX.YY.ZZ (e.g. LLVM 17.0.1 == 17.00.01), 
_LIBCPP_VERSION is
 // defined to XXYYZZ.
-#  define _LIBCPP_VERSION 19
+#  define _LIBCPP_VERSION 190100
 
 #  define _LIBCPP_CONCAT_IMPL(_X, _Y) _X##_Y
 #  define _LIBCPP_CONCAT(_X, _Y) _LIBCPP_CONCAT_IMPL(_X, _Y)

diff  --git a/llvm/utils/gn/secondary/llvm/version.gni 
b/llvm/utils/gn/secondary/llvm/version.gni
index 7c02ed396db5f..3f44a4645acf6 100644
--- a/llvm/utils/gn/secondary/llvm/version.gni
+++ b/llvm/utils/gn/secondary/llvm/version.gni
@@ -1,4 +1,4 @@
 llvm_version_major = 19
-llvm_version_minor = 0
+llvm_version_minor = 1
 llvm_version_patch = 0
 llvm_version = "$llvm_version_major.$llvm_version_minor.$llvm_version_patch"

diff  --git a/llvm/utils/lit/lit/__init__.py b/llvm/utils/lit/lit/__init__.py
index a5a1ff66bf417..03edfc3360972 100644
--- a/llvm/utils/lit/lit/__init__.py
+++ b/llvm/utils/lit/lit/__init__.py
@@ -2,7 +2,7 @@
 
 __author__ = "Daniel Dunbar"
 __email__ = "dan...@minormatter.com"
-__versioninfo__ = (19, 0, 0)
+__versioninfo__ = (19, 1, 0)
 __version__ = ".".join(str(v) for v in __versioninfo__) + "dev"
 
 __all__ = []



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100094

Backport b48819dbcdb48fc737dc22304ac343e4fdbae9ff

Requested by: @nikic

>From 36301dee358a56dbf3b79ab748444e364d0cb382 Mon Sep 17 00:00:00 2001
From: Nikita Popov 
Date: Tue, 23 Jul 2024 12:00:53 +0200
Subject: [PATCH] Revert " [LICM] Fold associative binary ops to promote code
 hoisting  (#81608)"

This reverts commit f2ccf80136a01ca69f766becafb329db6c54c0c8.

The flag propagation code is incorrect.

(cherry picked from commit b48819dbcdb48fc737dc22304ac343e4fdbae9ff)
---
 llvm/lib/Transforms/Scalar/LICM.cpp   |  62 
 llvm/test/CodeGen/PowerPC/common-chain.ll | 315 +-
 llvm/test/CodeGen/PowerPC/p10-spill-crlt.ll   |  16 +-
 llvm/test/Transforms/LICM/hoist-binop.ll  |  99 --
 llvm/test/Transforms/LICM/sink-foldable.ll|   4 +-
 .../LICM/update-scev-after-hoist.ll   |   2 +-
 6 files changed, 163 insertions(+), 335 deletions(-)
 delete mode 100644 llvm/test/Transforms/LICM/hoist-binop.ll

diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp 
b/llvm/lib/Transforms/Scalar/LICM.cpp
index fe264503dee9e..91ef2b4b7c183 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -113,8 +113,6 @@ STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP 
expressions "
 STATISTIC(NumIntAssociationsHoisted,
   "Number of invariant int expressions "
   "reassociated and hoisted out of the loop");
-STATISTIC(NumBOAssociationsHoisted, "Number of invariant BinaryOp expressions "
-"reassociated and hoisted out of the 
loop");
 
 /// Memory promotion is enabled by default.
 static cl::opt
@@ -2781,60 +2779,6 @@ static bool hoistMulAddAssociation(Instruction &I, Loop 
&L,
   return true;
 }
 
-/// Reassociate general associative binary expressions of the form
-///
-/// 1. "(LV op C1) op C2" ==> "LV op (C1 op C2)"
-///
-/// where op is an associative binary op, LV is a loop variant, and C1 and C2
-/// are loop invariants that we want to hoist.
-///
-/// TODO: This can be extended to more cases such as
-/// 2. "C1 op (C2 op LV)" ==> "(C1 op C2) op LV"
-/// 3. "(C1 op LV) op C2" ==> "LV op (C1 op C2)" if op is commutative
-/// 4. "C1 op (LV op C2)" ==> "(C1 op C2) op LV" if op is commutative
-static bool hoistBOAssociation(Instruction &I, Loop &L,
-   ICFLoopSafetyInfo &SafetyInfo,
-   MemorySSAUpdater &MSSAU, AssumptionCache *AC,
-   DominatorTree *DT) {
-  BinaryOperator *BO = dyn_cast(&I);
-  if (!BO || !BO->isAssociative())
-return false;
-
-  Instruction::BinaryOps Opcode = BO->getOpcode();
-  BinaryOperator *Op0 = dyn_cast(BO->getOperand(0));
-
-  // Transform: "(LV op C1) op C2" ==> "LV op (C1 op C2)"
-  if (Op0 && Op0->getOpcode() == Opcode) {
-Value *LV = Op0->getOperand(0);
-Value *C1 = Op0->getOperand(1);
-Value *C2 = BO->getOperand(1);
-
-if (L.isLoopInvariant(LV) || !L.isLoopInvariant(C1) ||
-!L.isLoopInvariant(C2))
-  return false;
-
-auto *Preheader = L.getLoopPreheader();
-assert(Preheader && "Loop is not in simplify form?");
-IRBuilder<> Builder(Preheader->getTerminator());
-Value *Inv = Builder.CreateBinOp(Opcode, C1, C2, "invariant.op");
-
-auto *NewBO =
-BinaryOperator::Create(Opcode, LV, Inv, BO->getName() + ".reass", BO);
-NewBO->copyIRFlags(BO);
-BO->replaceAllUsesWith(NewBO);
-eraseInstruction(*BO, SafetyInfo, MSSAU);
-
-// Note: (LV op C1) might not be erased if it has more uses than the one we
-//   just replaced.
-if (Op0->use_empty())
-  eraseInstruction(*Op0, SafetyInfo, MSSAU);
-
-return true;
-  }
-
-  return false;
-}
-
 static bool hoistArithmetics(Instruction &I, Loop &L,
  ICFLoopSafetyInfo &SafetyInfo,
  MemorySSAUpdater &MSSAU, AssumptionCache *AC,
@@ -2872,12 +2816,6 @@ static bool hoistArithmetics(Instruction &I, Loop &L,
 return true;
   }
 
-  if (hoistBOAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) {
-++NumHoisted;
-++NumBOAssociationsHoisted;
-return true;
-  }
-
   return false;
 }
 
diff --git a/llvm/test/CodeGen/PowerPC/common-chain.ll 
b/llvm/test/CodeGen/PowerPC/common-chain.ll
index ccf0e4520f468..5f8c21e30f8fd 100644
--- a/llvm/test/CodeGen/PowerPC/common-chain.ll
+++ b/llvm/test/CodeGen/PowerPC/common-chain.ll
@@ -642,8 +642,8 @@ define i64 @two_chain_two_bases_succ(ptr %p, i64 %offset, 
i64 %base1, i64 %base2
 ; CHECK-NEXT:cmpdi r7, 0
 ; CHECK-NEXT:ble cr0, .LBB6_4
 ; CHECK-NEXT:  # %bb.1: # %for.body.preheader
-; CHECK-NEXT:add r5, r5, r4
 ; CHECK-NEXT:add r6, r6, r4
+; CHECK-NEXT:add r5, r5, r4
 ; CHECK-NEXT:mtctr r7
 ; CHECK-NEXT:sldi r4, r4, 1
 ; CHECK-NEXT:add r5, r3, r5
@@ -743,219 +743,214 @@ define signext i32 @spill_reduce_succ(ptr %input1, ptr 
%input2, ptr %output, i6

[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100094
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-powerpc

Author: None (llvmbot)


Changes

Backport b48819dbcdb48fc737dc22304ac343e4fdbae9ff

Requested by: @nikic

---

Patch is 26.62 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/100094.diff


6 Files Affected:

- (modified) llvm/lib/Transforms/Scalar/LICM.cpp (-62) 
- (modified) llvm/test/CodeGen/PowerPC/common-chain.ll (+155-160) 
- (modified) llvm/test/CodeGen/PowerPC/p10-spill-crlt.ll (+5-11) 
- (removed) llvm/test/Transforms/LICM/hoist-binop.ll (-99) 
- (modified) llvm/test/Transforms/LICM/sink-foldable.ll (+2-2) 
- (modified) llvm/test/Transforms/LICM/update-scev-after-hoist.ll (+1-1) 


``diff
diff --git a/llvm/lib/Transforms/Scalar/LICM.cpp 
b/llvm/lib/Transforms/Scalar/LICM.cpp
index fe264503dee9e..91ef2b4b7c183 100644
--- a/llvm/lib/Transforms/Scalar/LICM.cpp
+++ b/llvm/lib/Transforms/Scalar/LICM.cpp
@@ -113,8 +113,6 @@ STATISTIC(NumFPAssociationsHoisted, "Number of invariant FP 
expressions "
 STATISTIC(NumIntAssociationsHoisted,
   "Number of invariant int expressions "
   "reassociated and hoisted out of the loop");
-STATISTIC(NumBOAssociationsHoisted, "Number of invariant BinaryOp expressions "
-"reassociated and hoisted out of the 
loop");
 
 /// Memory promotion is enabled by default.
 static cl::opt
@@ -2781,60 +2779,6 @@ static bool hoistMulAddAssociation(Instruction &I, Loop 
&L,
   return true;
 }
 
-/// Reassociate general associative binary expressions of the form
-///
-/// 1. "(LV op C1) op C2" ==> "LV op (C1 op C2)"
-///
-/// where op is an associative binary op, LV is a loop variant, and C1 and C2
-/// are loop invariants that we want to hoist.
-///
-/// TODO: This can be extended to more cases such as
-/// 2. "C1 op (C2 op LV)" ==> "(C1 op C2) op LV"
-/// 3. "(C1 op LV) op C2" ==> "LV op (C1 op C2)" if op is commutative
-/// 4. "C1 op (LV op C2)" ==> "(C1 op C2) op LV" if op is commutative
-static bool hoistBOAssociation(Instruction &I, Loop &L,
-   ICFLoopSafetyInfo &SafetyInfo,
-   MemorySSAUpdater &MSSAU, AssumptionCache *AC,
-   DominatorTree *DT) {
-  BinaryOperator *BO = dyn_cast(&I);
-  if (!BO || !BO->isAssociative())
-return false;
-
-  Instruction::BinaryOps Opcode = BO->getOpcode();
-  BinaryOperator *Op0 = dyn_cast(BO->getOperand(0));
-
-  // Transform: "(LV op C1) op C2" ==> "LV op (C1 op C2)"
-  if (Op0 && Op0->getOpcode() == Opcode) {
-Value *LV = Op0->getOperand(0);
-Value *C1 = Op0->getOperand(1);
-Value *C2 = BO->getOperand(1);
-
-if (L.isLoopInvariant(LV) || !L.isLoopInvariant(C1) ||
-!L.isLoopInvariant(C2))
-  return false;
-
-auto *Preheader = L.getLoopPreheader();
-assert(Preheader && "Loop is not in simplify form?");
-IRBuilder<> Builder(Preheader->getTerminator());
-Value *Inv = Builder.CreateBinOp(Opcode, C1, C2, "invariant.op");
-
-auto *NewBO =
-BinaryOperator::Create(Opcode, LV, Inv, BO->getName() + ".reass", BO);
-NewBO->copyIRFlags(BO);
-BO->replaceAllUsesWith(NewBO);
-eraseInstruction(*BO, SafetyInfo, MSSAU);
-
-// Note: (LV op C1) might not be erased if it has more uses than the one we
-//   just replaced.
-if (Op0->use_empty())
-  eraseInstruction(*Op0, SafetyInfo, MSSAU);
-
-return true;
-  }
-
-  return false;
-}
-
 static bool hoistArithmetics(Instruction &I, Loop &L,
  ICFLoopSafetyInfo &SafetyInfo,
  MemorySSAUpdater &MSSAU, AssumptionCache *AC,
@@ -2872,12 +2816,6 @@ static bool hoistArithmetics(Instruction &I, Loop &L,
 return true;
   }
 
-  if (hoistBOAssociation(I, L, SafetyInfo, MSSAU, AC, DT)) {
-++NumHoisted;
-++NumBOAssociationsHoisted;
-return true;
-  }
-
   return false;
 }
 
diff --git a/llvm/test/CodeGen/PowerPC/common-chain.ll 
b/llvm/test/CodeGen/PowerPC/common-chain.ll
index ccf0e4520f468..5f8c21e30f8fd 100644
--- a/llvm/test/CodeGen/PowerPC/common-chain.ll
+++ b/llvm/test/CodeGen/PowerPC/common-chain.ll
@@ -642,8 +642,8 @@ define i64 @two_chain_two_bases_succ(ptr %p, i64 %offset, 
i64 %base1, i64 %base2
 ; CHECK-NEXT:cmpdi r7, 0
 ; CHECK-NEXT:ble cr0, .LBB6_4
 ; CHECK-NEXT:  # %bb.1: # %for.body.preheader
-; CHECK-NEXT:add r5, r5, r4
 ; CHECK-NEXT:add r6, r6, r4
+; CHECK-NEXT:add r5, r5, r4
 ; CHECK-NEXT:mtctr r7
 ; CHECK-NEXT:sldi r4, r4, 1
 ; CHECK-NEXT:add r5, r3, r5
@@ -743,219 +743,214 @@ define signext i32 @spill_reduce_succ(ptr %input1, ptr 
%input2, ptr %output, i64
 ; CHECK-NEXT:std r9, -184(r1) # 8-byte Folded Spill
 ; CHECK-NEXT:std r8, -176(r1) # 8-byte Folded Spill
 ; CHECK-NEXT:std r7, -168(r1) # 8-byte Folded Spill
-; CHECK-NEXT:std r4, -160(r1) # 8-byte Folded Spill
+; CHECK-NEXT:std r3, -160(r1) # 8-byte Folded Spill
 ; CHECK-NEXT:ble cr0, .LBB7_7
 ; C

[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn milestoned 
https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn created 
https://github.com/llvm/llvm-project/pull/100097

As discussed in  https://github.com/llvm/llvm-project/pull/92555 flip the 
default for the option added in
https://github.com/llvm/llvm-project/pull/99536 to true.

This restores the original behavior for the release branch to give the 
VPlan-based cost model more time to mature on main.

>From a72a0bf44a8b259be3c62e79082d2fdc04fc2771 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Tue, 23 Jul 2024 11:15:26 +0100
Subject: [PATCH] [LV] Disable VPlan-based cost model for 19.x release.

As discussed in  https://github.com/llvm/llvm-project/pull/92555 flip
the default for the option added in
https://github.com/llvm/llvm-project/pull/99536 to true.

This restores the original behavior for the release branch to give the
VPlan-based cost model more time to mature on main.
---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 2 +-
 .../test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6d28b8fabe42e..68363abdb817a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold(
 cl::desc("The maximum allowed number of runtime memory checks"));
 
 static cl::opt UseLegacyCostModel(
-"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden,
+"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden,
 cl::desc("Use the legacy cost model instead of the VPlan-based cost model. 
"
  "This option will be removed in the future."));
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll 
b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index fc310f4163082..1a78eaf644723 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF
@@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)


Changes

As discussed in  https://github.com/llvm/llvm-project/pull/92555 flip the 
default for the option added in
https://github.com/llvm/llvm-project/pull/99536 to true.

This restores the original behavior for the release branch to give the 
VPlan-based cost model more time to mature on main.

---
Full diff: https://github.com/llvm/llvm-project/pull/100097.diff


2 Files Affected:

- (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+1-1) 
- (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll 
(-2) 


``diff
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6d28b8fabe42e..68363abdb817a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold(
 cl::desc("The maximum allowed number of runtime memory checks"));
 
 static cl::opt UseLegacyCostModel(
-"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden,
+"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden,
 cl::desc("Use the legacy cost model instead of the VPlan-based cost model. 
"
  "This option will be removed in the future."));
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll 
b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index fc310f4163082..1a78eaf644723 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF
@@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF

``




https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Create `LoopRelatedClause` (PR #99506)

2024-07-23 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah approved this pull request.

LGTM, thanks!

https://github.com/llvm/llvm-project/pull/99506
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld][ELF][LoongArch] Support R_LARCH_TLS_{LD, GD, DESC}_PCREL_S2 (PR #100105)

2024-07-23 Thread via llvm-branch-commits

https://github.com/wangleiat created 
https://github.com/llvm/llvm-project/pull/100105

None


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld][ELF][LoongArch] Support R_LARCH_TLS_{LD, GD, DESC}_PCREL_S2 (PR #100105)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lld-elf

Author: wanglei (wangleiat)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/100105.diff


5 Files Affected:

- (modified) lld/ELF/Arch/LoongArch.cpp (+10) 
- (modified) lld/ELF/Relocations.cpp (+2-1) 
- (added) lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s (+129) 
- (added) lld/test/ELF/loongarch-tls-ld-pcrel20-s2.s (+82) 
- (added) lld/test/ELF/loongarch-tlsdesc-pcrel20-s2.s (+142) 


``diff
diff --git a/lld/ELF/Arch/LoongArch.cpp b/lld/ELF/Arch/LoongArch.cpp
index 9466e8b1ce54d..db0bc6c760096 100644
--- a/lld/ELF/Arch/LoongArch.cpp
+++ b/lld/ELF/Arch/LoongArch.cpp
@@ -511,6 +511,12 @@ RelExpr LoongArch::getRelExpr(const RelType type, const 
Symbol &s,
 return R_TLSDESC;
   case R_LARCH_TLS_DESC_CALL:
 return R_TLSDESC_CALL;
+  case R_LARCH_TLS_LD_PCREL20_S2:
+return R_TLSLD_PC;
+  case R_LARCH_TLS_GD_PCREL20_S2:
+return R_TLSGD_PC;
+  case R_LARCH_TLS_DESC_PCREL20_S2:
+return R_TLSDESC_PC;
 
   // Other known relocs that are explicitly unimplemented:
   //
@@ -557,7 +563,11 @@ void LoongArch::relocate(uint8_t *loc, const Relocation 
&rel,
 write64le(loc, val);
 return;
 
+  // Relocs intended for `pcaddi`.
   case R_LARCH_PCREL20_S2:
+  case R_LARCH_TLS_LD_PCREL20_S2:
+  case R_LARCH_TLS_GD_PCREL20_S2:
+  case R_LARCH_TLS_DESC_PCREL20_S2:
 checkInt(loc, val, 22, rel);
 checkAlignment(loc, val, 4, rel);
 write32le(loc, setJ20(read32le(loc), val >> 2));
diff --git a/lld/ELF/Relocations.cpp b/lld/ELF/Relocations.cpp
index 36857d72c647e..6ad5c3bf8f6e9 100644
--- a/lld/ELF/Relocations.cpp
+++ b/lld/ELF/Relocations.cpp
@@ -1308,7 +1308,8 @@ static unsigned handleTlsRelocation(RelType type, Symbol 
&sym,
   // LoongArch does not yet implement transition from TLSDESC to LE/IE, so
   // generate TLSDESC dynamic relocation for the dynamic linker to handle.
   if (config->emachine == EM_LOONGARCH &&
-  oneof(expr)) {
+  oneof(expr)) {
 if (expr != R_TLSDESC_CALL) {
   sym.setFlags(NEEDS_TLSDESC);
   c.addReloc({expr, type, offset, addend, &sym});
diff --git a/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s 
b/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s
new file mode 100644
index 0..d4d12b9d4a520
--- /dev/null
+++ b/lld/test/ELF/loongarch-tls-gd-pcrel20-s2.s
@@ -0,0 +1,129 @@
+# REQUIRES: loongarch
+# RUN: rm -rf %t && split-file %s %t
+
+# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/a.s -o %t/a.32.o
+# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/bc.s -o %t/bc.32.o
+# RUN: ld.lld -shared -soname=bc.so %t/bc.32.o -o %t/bc.32.so
+# RUN: llvm-mc --filetype=obj --triple=loongarch32 %t/tga.s -o %t/tga.32.o
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/a.s -o %t/a.64.o
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/bc.s -o %t/bc.64.o
+# RUN: ld.lld -shared -soname=bc.so %t/bc.64.o -o %t/bc.64.so
+# RUN: llvm-mc --filetype=obj --triple=loongarch64 %t/tga.s -o %t/tga.64.o
+
+## LA32 GD
+# RUN: ld.lld -shared %t/a.32.o %t/bc.32.o -o %t/gd.32.so
+# RUN: llvm-readobj -r %t/gd.32.so | FileCheck --check-prefix=GD32-REL %s
+# RUN: llvm-objdump -d --no-show-raw-insn %t/gd.32.so | FileCheck 
--check-prefix=GD32 %s
+
+## LA32 GD -> LE
+# RUN: ld.lld %t/a.32.o %t/bc.32.o %t/tga.32.o -o %t/le.32
+# RUN: llvm-readelf -r %t/le.32 | FileCheck --check-prefix=NOREL %s
+# RUN: llvm-readelf -x .got %t/le.32 | FileCheck --check-prefix=LE32-GOT %s
+# RUN: ld.lld -pie %t/a.32.o %t/bc.32.o %t/tga.32.o -o %t/le-pie.32
+# RUN: llvm-readelf -r %t/le-pie.32 | FileCheck --check-prefix=NOREL %s
+# RUN: llvm-readelf -x .got %t/le-pie.32 | FileCheck --check-prefix=LE32-GOT %s
+
+## LA32 GD -> IE
+# RUN: ld.lld %t/a.32.o %t/bc.32.so %t/tga.32.o -o %t/ie.32
+# RUN: llvm-readobj -r %t/ie.32 | FileCheck --check-prefix=IE32-REL %s
+# RUN: llvm-readelf -x .got %t/ie.32 | FileCheck --check-prefix=IE32-GOT %s
+
+## LA64 GD
+# RUN: ld.lld -shared %t/a.64.o %t/bc.64.o -o %t/gd.64.so
+# RUN: llvm-readobj -r %t/gd.64.so | FileCheck --check-prefix=GD64-REL %s
+# RUN: llvm-objdump -d --no-show-raw-insn %t/gd.64.so | FileCheck 
--check-prefix=GD64 %s
+
+## LA64 GD -> LE
+# RUN: ld.lld %t/a.64.o %t/bc.64.o %t/tga.64.o -o %t/le.64
+# RUN: llvm-readelf -r %t/le.64 | FileCheck --check-prefix=NOREL %s
+# RUN: llvm-readelf -x .got %t/le.64 | FileCheck --check-prefix=LE64-GOT %s
+# RUN: ld.lld -pie %t/a.64.o %t/bc.64.o %t/tga.64.o -o %t/le-pie.64
+# RUN: llvm-readelf -r %t/le-pie.64 | FileCheck --check-prefix=NOREL %s
+# RUN: llvm-readelf -x .got %t/le-pie.64 | FileCheck --check-prefix=LE64-GOT %s
+
+## LA64 GD -> IE
+# RUN: ld.lld %t/a.64.o %t/bc.64.so %t/tga.64.o -o %t/ie.64
+# RUN: llvm-readobj -r %t/ie.64 | FileCheck --check-prefix=IE64-REL %s
+# RUN: llvm-readelf -x .got %t/ie.64 | FileCheck --check-prefix=IE64-GOT %s
+
+# GD32-REL:  .rela.dyn {
+# GD32-REL-NEXT:   0x20300 R_LARCH_TLS_DTPMOD32 a 0x0
+# GD32-REL-NEXT:   0x20304 R_LARCH_TLS_DTPREL32 a 0x0
+# GD32-REL-NEXT:   0x20308 R_LARCH_TLS_

[llvm-branch-commits] [llvm] release/19.x: Revert " [LICM] Fold associative binary ops to promote code hoisting (#81608)" (PR #100094)

2024-07-23 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.


https://github.com/llvm/llvm-project/pull/100094
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Amir Ayupov via llvm-branch-commits


@@ -266,6 +287,47 @@ class StaleMatcher {
 }
 return BestBlock;
   }
+  // Uses pseudo probe information to attach the profile to the appropriate
+  // block.
+  const FlowBlock *matchWithPseudoProbes(
+  const std::vector &PseudoProbes) const {
+// Searches for the pseudo probe attached to the matched function's block,
+// ignoring pseudo probes attached to function calls and inlined functions'
+// blocks.
+std::vector BlockPseudoProbes;
+for (const auto &PseudoProbe : PseudoProbes) {
+  // Ensures that pseudo probe information belongs to the appropriate
+  // function and not an inlined function.
+  if (PseudoProbe.GUID != YamlBFGUID)
+continue;
+  // Skips pseudo probes attached to function calls.
+  if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
+continue;
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
+}
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index <= Blocks.size() && "Invalid pseudo probe index");
+
+auto It = IndexToBinaryPseudoProbes.find(Index);
+assert(It != IndexToBinaryPseudoProbes.end() &&
+   "All blocks should have a pseudo probe");

aaupov wrote:

This assert should become a check as it's possible to have blocks without probes

https://github.com/llvm/llvm-project/pull/99891
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)

2024-07-23 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne edited 
https://github.com/llvm/llvm-project/pull/99375
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)

2024-07-23 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne approved this pull request.

LGTM. Let's cherry-pick.

https://github.com/llvm/llvm-project/pull/99375
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru updated https://github.com/llvm/llvm-project/pull/100097

>From a72a0bf44a8b259be3c62e79082d2fdc04fc2771 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Tue, 23 Jul 2024 11:15:26 +0100
Subject: [PATCH 1/2] [LV] Disable VPlan-based cost model for 19.x release.

As discussed in  https://github.com/llvm/llvm-project/pull/92555 flip
the default for the option added in
https://github.com/llvm/llvm-project/pull/99536 to true.

This restores the original behavior for the release branch to give the
VPlan-based cost model more time to mature on main.
---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | 2 +-
 .../test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6d28b8fabe42e..68363abdb817a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold(
 cl::desc("The maximum allowed number of runtime memory checks"));
 
 static cl::opt UseLegacyCostModel(
-"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden,
+"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden,
 cl::desc("Use the legacy cost model instead of the VPlan-based cost model. 
"
  "This option will be removed in the future."));
 
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll 
b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index fc310f4163082..1a78eaf644723 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF
@@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF

>From 835a2491de62ee09588bfb61ee31600449881675 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Tue, 23 Jul 2024 15:39:35 +0100
Subject: [PATCH 2/2] !fixup update test for new default.

---
 .../Inputs/x86-loopvectorize-costmodel.ll.expected   | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
 
b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
index 5aa270e76f4c8..e862bf87d265c 100644
--- 
a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
+++ 
b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
@@ -17,7 +17,6 @@ define void @test() {
 ; CHECK:  LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = 
load float, ptr %in0, align 4
 ; CHECK:  LV: Found an estimated cost of 22 for VF 32 For instruction: %v0 = 
load float, ptr %in0, align 4
 ; CHECK:  LV: Found an estimated cost of 92 for VF 64 For instruction: %v0 = 
load float, ptr %in0, align 4
-; CHECK:  LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = 
load float, ptr %in0, align 4
 ;
 entry:
   br label %for.body

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 183e8ec - [LV] Disable VPlan-based cost model for 19.x release.

2024-07-23 Thread Tobias Hieta via llvm-branch-commits

Author: Florian Hahn
Date: 2024-07-23T17:02:03+02:00
New Revision: 183e8ecc97a996c24e920e7e9668bc65a0d19439

URL: 
https://github.com/llvm/llvm-project/commit/183e8ecc97a996c24e920e7e9668bc65a0d19439
DIFF: 
https://github.com/llvm/llvm-project/commit/183e8ecc97a996c24e920e7e9668bc65a0d19439.diff

LOG: [LV] Disable VPlan-based cost model for 19.x release.

As discussed in  https://github.com/llvm/llvm-project/pull/92555 flip
the default for the option added in
https://github.com/llvm/llvm-project/pull/99536 to true.

This restores the original behavior for the release branch to give the
VPlan-based cost model more time to mature on main.

Added: 


Modified: 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll

llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected

Removed: 




diff  --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp 
b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 6d28b8fabe42e..68363abdb817a 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -206,7 +206,7 @@ static cl::opt VectorizeMemoryCheckThreshold(
 cl::desc("The maximum allowed number of runtime memory checks"));
 
 static cl::opt UseLegacyCostModel(
-"vectorize-use-legacy-cost-model", cl::init(false), cl::Hidden,
+"vectorize-use-legacy-cost-model", cl::init(true), cl::Hidden,
 cl::desc("Use the legacy cost model instead of the VPlan-based cost model. 
"
  "This option will be removed in the future."));
 

diff  --git a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll 
b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
index fc310f4163082..1a78eaf644723 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@@ -135,7 +135,6 @@ define void @vector_reverse_i64(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF
@@ -339,7 +338,6 @@ define void @vector_reverse_f32(ptr nocapture noundef 
writeonly %A, ptr nocaptur
 ; CHECK-NEXT:  LV: Interleaving is not beneficial.
 ; CHECK-NEXT:  LV: Found a vectorizable loop (vscale x 4) in 
 ; CHECK-NEXT:  LEV: Epilogue vectorization is not profitable for this loop
-; CHECK-NEXT:  VF picked by VPlan cost model: vscale x 4
 ; CHECK-NEXT:  Executing best plan with VF=vscale x 4, UF=1
 ; CHECK-NEXT:  VPlan 'Final VPlan for VF={vscale x 4},UF>=1' {
 ; CHECK-NEXT:  Live-in vp<%0> = VF * UF

diff  --git 
a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
 
b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
index 5aa270e76f4c8..e862bf87d265c 100644
--- 
a/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
+++ 
b/llvm/test/tools/UpdateTestChecks/update_analyze_test_checks/Inputs/x86-loopvectorize-costmodel.ll.expected
@@ -17,7 +17,6 @@ define void @test() {
 ; CHECK:  LV: Found an estimated cost of 5 for VF 16 For instruction: %v0 = 
load float, ptr %in0, align 4
 ; CHECK:  LV: Found an estimated cost of 22 for VF 32 For instruction: %v0 = 
load float, ptr %in0, align 4
 ; CHECK:  LV: Found an estimated cost of 92 for VF 64 For instruction: %v0 = 
load float, ptr %in0, align 4
-; CHECK:  LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = 
load float, ptr %in0, align 4
 ;
 entry:
   br label %for.body



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Tobias Hieta via llvm-branch-commits

https://github.com/tru closed https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Disable VPlan-based cost model for 19.x release. (PR #100097)

2024-07-23 Thread Tobias Hieta via llvm-branch-commits

tru wrote:

Merged manually as 183e8ecc97a996c24e920e7e9668bc65a0d19439 since I messed it 
up with a merge commit instead of a rebase. Sorry, learning the new flow.

https://github.com/llvm/llvm-project/pull/100097
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100141
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@ldionne What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100141
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100141

Backport 9628777

Requested by: @ldionne

>From f281cb2886edb46067606b62163e0c3d6cdfd965 Mon Sep 17 00:00:00 2001
From: PaulXiCao 
Date: Tue, 23 Jul 2024 15:11:44 +
Subject: [PATCH] [libc++][math] Fix undue overflowing of `std::hypot(x,y,z)`
 (#93350)

The 3-dimentionsional `std::hypot(x,y,z)` was sub-optimally implemented.
This lead to possible over-/underflows in (intermediate) results which
can be circumvented by this proposed change.

The idea is to to scale the arguments (see linked issue for full
discussion).

Tests have been added for problematic over- and underflows.

Closes #92782

(cherry picked from commit 9628777479a970db5d0c2d0b456dac6633864760)
---
 libcxx/include/__math/hypot.h | 89 ++
 libcxx/include/cmath  | 25 +
 .../test/libcxx/transitive_includes/cxx17.csv |  3 +
 .../test/libcxx/transitive_includes/cxx20.csv |  3 +
 .../test/libcxx/transitive_includes/cxx23.csv |  3 +
 .../test/libcxx/transitive_includes/cxx26.csv |  3 +
 .../test/std/numerics/c.math/cmath.pass.cpp   | 91 +++
 libcxx/test/support/fp_compare.h  | 45 -
 8 files changed, 197 insertions(+), 65 deletions(-)

diff --git a/libcxx/include/__math/hypot.h b/libcxx/include/__math/hypot.h
index 1bf193a9ab7ee..61fd260c59409 100644
--- a/libcxx/include/__math/hypot.h
+++ b/libcxx/include/__math/hypot.h
@@ -15,10 +15,21 @@
 #include <__type_traits/is_same.h>
 #include <__type_traits/promote.h>
 
+#if _LIBCPP_STD_VER >= 17
+#  include <__algorithm/max.h>
+#  include <__math/abs.h>
+#  include <__math/roots.h>
+#  include <__utility/pair.h>
+#  include 
+#endif
+
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
 #endif
 
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
 _LIBCPP_BEGIN_NAMESPACE_STD
 
 namespace __math {
@@ -41,8 +52,86 @@ inline _LIBCPP_HIDE_FROM_ABI typename __promote<_A1, 
_A2>::type hypot(_A1 __x, _
   return __math::hypot((__result_type)__x, (__result_type)__y);
 }
 
+#if _LIBCPP_STD_VER >= 17
+// Factors needed to determine if over-/underflow might happen for 
`std::hypot(x,y,z)`.
+// returns [overflow_threshold, overflow_scale]
+template 
+_LIBCPP_HIDE_FROM_ABI std::pair<_Real, _Real> __hypot_factors() {
+  static_assert(std::numeric_limits<_Real>::is_iec559);
+
+  if constexpr (std::is_same_v<_Real, float>) {
+static_assert(-125 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+128 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+62f, 0x1.0p-70f};
+  } else if constexpr (std::is_same_v<_Real, double>) {
+static_assert(-1021 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+1024 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+510, 0x1.0p-600};
+  } else { // long double
+static_assert(std::is_same_v<_Real, long double>);
+
+// preprocessor guard necessary, otherwise literals (e.g. `0x1.0p+8'190l`) 
throw warnings even when shielded by `if
+// constexpr`
+#  if __DBL_MAX_EXP__ == __LDBL_MAX_EXP__
+static_assert(sizeof(_Real) == sizeof(double));
+return static_cast>(__math::__hypot_factors());
+#  else
+static_assert(sizeof(_Real) > sizeof(double));
+static_assert(-16381 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+16384 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+8190l, 0x1.0p-9000l};
+#  endif
+  }
+}
+
+// Computes the three-dimensional hypotenuse: `std::hypot(x,y,z)`.
+// The naive implementation might over-/underflow which is why this 
implementation is more involved:
+//If the square of an argument might run into issues, we scale the 
arguments appropriately.
+// See https://github.com/llvm/llvm-project/issues/92782 for a detailed 
discussion and summary.
+template 
+_LIBCPP_HIDE_FROM_ABI _Real __hypot(_Real __x, _Real __y, _Real __z) {
+  const _Real __max_abs = std::max(__math::fabs(__x), 
std::max(__math::fabs(__y), __math::fabs(__z)));
+  const auto [__overflow_threshold, __overflow_scale] = 
__math::__hypot_factors<_Real>();
+  _Real __scale;
+  if (__max_abs > __overflow_threshold) { // x*x + y*y + z*z might overflow
+__scale = __overflow_scale;
+__x *= __scale;
+__y *= __scale;
+__z *= __scale;
+  } else if (__max_abs < 1 / __overflow_threshold) { // x*x + y*y + z*z might 
underflow
+__scale = 1 / __overflow_scale;
+__x *= __scale;
+__y *= __scale;
+__z *= __scale;
+  } else
+__scale = 1;
+  return __math::sqrt(__x * __x + __y * __y + __z * __z) / __scale;
+}
+
+inline _LIBCPP_HIDE_FROM_ABI float hypot(float __x, float __y, float __z) { 
return __math::__hypot(__x, __y, __z); }
+
+inline _LIBCPP_HIDE_FROM_ABI double hypot(double __x, double __y, double __z) 
{ return __math::__hypot(__x, __y, __z); }
+
+inline _LIBCPP_HIDE_FROM_ABI long double hypot(long double __x, long double 
__y, long double __z) {
+  

[llvm-branch-commits] [libcxx] release/19.x: [libc++][math] Fix undue overflowing of `std::hypot(x, y, z)` (#93350) (PR #100141)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport 9628777

Requested by: @ldionne

---
Full diff: https://github.com/llvm/llvm-project/pull/100141.diff


8 Files Affected:

- (modified) libcxx/include/__math/hypot.h (+89) 
- (modified) libcxx/include/cmath (+1-24) 
- (modified) libcxx/test/libcxx/transitive_includes/cxx17.csv (+3) 
- (modified) libcxx/test/libcxx/transitive_includes/cxx20.csv (+3) 
- (modified) libcxx/test/libcxx/transitive_includes/cxx23.csv (+3) 
- (modified) libcxx/test/libcxx/transitive_includes/cxx26.csv (+3) 
- (modified) libcxx/test/std/numerics/c.math/cmath.pass.cpp (+75-16) 
- (modified) libcxx/test/support/fp_compare.h (+20-25) 


``diff
diff --git a/libcxx/include/__math/hypot.h b/libcxx/include/__math/hypot.h
index 1bf193a9ab7ee..61fd260c59409 100644
--- a/libcxx/include/__math/hypot.h
+++ b/libcxx/include/__math/hypot.h
@@ -15,10 +15,21 @@
 #include <__type_traits/is_same.h>
 #include <__type_traits/promote.h>
 
+#if _LIBCPP_STD_VER >= 17
+#  include <__algorithm/max.h>
+#  include <__math/abs.h>
+#  include <__math/roots.h>
+#  include <__utility/pair.h>
+#  include 
+#endif
+
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
 #  pragma GCC system_header
 #endif
 
+_LIBCPP_PUSH_MACROS
+#include <__undef_macros>
+
 _LIBCPP_BEGIN_NAMESPACE_STD
 
 namespace __math {
@@ -41,8 +52,86 @@ inline _LIBCPP_HIDE_FROM_ABI typename __promote<_A1, 
_A2>::type hypot(_A1 __x, _
   return __math::hypot((__result_type)__x, (__result_type)__y);
 }
 
+#if _LIBCPP_STD_VER >= 17
+// Factors needed to determine if over-/underflow might happen for 
`std::hypot(x,y,z)`.
+// returns [overflow_threshold, overflow_scale]
+template 
+_LIBCPP_HIDE_FROM_ABI std::pair<_Real, _Real> __hypot_factors() {
+  static_assert(std::numeric_limits<_Real>::is_iec559);
+
+  if constexpr (std::is_same_v<_Real, float>) {
+static_assert(-125 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+128 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+62f, 0x1.0p-70f};
+  } else if constexpr (std::is_same_v<_Real, double>) {
+static_assert(-1021 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+1024 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+510, 0x1.0p-600};
+  } else { // long double
+static_assert(std::is_same_v<_Real, long double>);
+
+// preprocessor guard necessary, otherwise literals (e.g. `0x1.0p+8'190l`) 
throw warnings even when shielded by `if
+// constexpr`
+#  if __DBL_MAX_EXP__ == __LDBL_MAX_EXP__
+static_assert(sizeof(_Real) == sizeof(double));
+return static_cast>(__math::__hypot_factors());
+#  else
+static_assert(sizeof(_Real) > sizeof(double));
+static_assert(-16381 == std::numeric_limits<_Real>::min_exponent);
+static_assert(+16384 == std::numeric_limits<_Real>::max_exponent);
+return {0x1.0p+8190l, 0x1.0p-9000l};
+#  endif
+  }
+}
+
+// Computes the three-dimensional hypotenuse: `std::hypot(x,y,z)`.
+// The naive implementation might over-/underflow which is why this 
implementation is more involved:
+//If the square of an argument might run into issues, we scale the 
arguments appropriately.
+// See https://github.com/llvm/llvm-project/issues/92782 for a detailed 
discussion and summary.
+template 
+_LIBCPP_HIDE_FROM_ABI _Real __hypot(_Real __x, _Real __y, _Real __z) {
+  const _Real __max_abs = std::max(__math::fabs(__x), 
std::max(__math::fabs(__y), __math::fabs(__z)));
+  const auto [__overflow_threshold, __overflow_scale] = 
__math::__hypot_factors<_Real>();
+  _Real __scale;
+  if (__max_abs > __overflow_threshold) { // x*x + y*y + z*z might overflow
+__scale = __overflow_scale;
+__x *= __scale;
+__y *= __scale;
+__z *= __scale;
+  } else if (__max_abs < 1 / __overflow_threshold) { // x*x + y*y + z*z might 
underflow
+__scale = 1 / __overflow_scale;
+__x *= __scale;
+__y *= __scale;
+__z *= __scale;
+  } else
+__scale = 1;
+  return __math::sqrt(__x * __x + __y * __y + __z * __z) / __scale;
+}
+
+inline _LIBCPP_HIDE_FROM_ABI float hypot(float __x, float __y, float __z) { 
return __math::__hypot(__x, __y, __z); }
+
+inline _LIBCPP_HIDE_FROM_ABI double hypot(double __x, double __y, double __z) 
{ return __math::__hypot(__x, __y, __z); }
+
+inline _LIBCPP_HIDE_FROM_ABI long double hypot(long double __x, long double 
__y, long double __z) {
+  return __math::__hypot(__x, __y, __z);
+}
+
+template  && is_arithmetic_v<_A2> && 
is_arithmetic_v<_A3>, int> = 0 >
+_LIBCPP_HIDE_FROM_ABI typename __promote<_A1, _A2, _A3>::type hypot(_A1 __x, 
_A2 __y, _A3 __z) _NOEXCEPT {
+  using __result_type = typename __promote<_A1, _A2, _A3>::type;
+  static_assert(!(
+  std::is_same_v<_A1, __result_type> && std::is_same_v<_A2, __result_type> 
&& std::is_same_v<_A3, __result_type>));
+  return __math::__hypot(
+  static_cast<__result_type>(__x), static_cast<__result_type>(__y), 
static_cast<__result_type>(__z));
+}
+

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96872

>From ef284fddade0ad779fbbd4bad48a4d63667d3d65 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
 __builtin_amdgcn_global_atomic_fadd_{f32|f64}

Need to emit syncscope and new metadata to get the native instruction,
most of the time.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 39 +--
 .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl|  2 +-
 .../builtins-fp-atomics-gfx12.cl  |  4 +-
 .../builtins-fp-atomics-gfx90a.cl |  4 +-
 .../builtins-fp-atomics-gfx940.cl |  4 +-
 5 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5639239359ab8..0fb45f0288d46 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -58,6 +58,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/ScopedPrinter.h"
@@ -18743,8 +18744,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
@@ -18756,18 +18755,11 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getHalfTy(getLLVMContext()), 2);
   IID = Intrinsic::amdgcn_global_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -19190,7 +19182,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
   case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
-  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
+  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19206,6 +19200,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19240,8 +19236,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
   AO = AtomicOrdering::SequentiallyConsistent;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
@@ -19256,6 +19257,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID);
 if (Volatile)
   RMW->setVolatile(true);
+
+unsigned AddrSpace = Ptr.getType()->getAddressSpace();
+if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) {
+  // Most targets require "amdgpu.no.fine.grained.memory" to emit the 
nativ

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96873

>From c4cc064cad9a5921b52e00b5a19ca834f5262772 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
 {global|flat}_atomic_fadd_v2f16 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 20 ++-
 .../builtins-fp-atomics-gfx12.cl  |  9 ++---
 .../builtins-fp-atomics-gfx90a.cl |  2 +-
 .../builtins-fp-atomics-gfx940.cl |  3 ++-
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 77dadeb1f22fa..baf68c7e81569 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18744,22 +18744,15 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -18779,11 +18772,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgTy = llvm::Type::getFloatTy(getLLVMContext());
   IID = Intrinsic::amdgcn_flat_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19184,7 +19172,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19202,6 +19192,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 6b8a6d14575db..07e63a8711c7f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr 
%{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX12-LABEL:  test_flat_add_2f16
 // GFX12: flat_atomic_pk_add_f16
 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) {
@@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) {
 }
 
 // CHECK-LABEL: test_global_add_half2
-// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr 
addrspace(1) %{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> 
%{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX12-LABEL:  test_global_add_half2
 // GFX12:  global_atomic_pk_add_f16 v2, v[0:1], v2, off

[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100145

Backport c2e4386

Requested by: @ldionne

>From 8325799d41659bd1ff72ed3d628732f2a72a5dc8 Mon Sep 17 00:00:00 2001
From: Mark de Wever 
Date: Tue, 23 Jul 2024 18:03:28 +0200
Subject: [PATCH] [libc++][vector] Tests shrink_to_fit requirement.
 (#98009)

`vector`'s shrink_to_fit implementation is using the
"swap-to-free-container-resources-trick" which only shrinks when the
input vector is empty. Since the request to shrink_to_fit is
non-binding, this is a valid implementation. It is not a high-quality
implementation. Since `vector` is not a very popular container the
implementation has not been changed and only a test to validate the
non-growing property has been added.

This was discovered while investigating #95161.

(cherry picked from commit c2e438675754b83c31d7d5ba40cb13fe77e795de)
---
 .../vector.bool/shrink_to_fit.pass.cpp| 45 ++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git 
a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp 
b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
index b39245cab7bf4..f8bcee31964bb 100644
--- a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
@@ -39,11 +39,54 @@ TEST_CONSTEXPR_CXX20 bool tests()
 return true;
 }
 
+#if TEST_STD_VER >= 23
+template 
+struct increasing_allocator {
+  using value_type = T;
+  std::size_t min_elements = 1000;
+  increasing_allocator()   = default;
+
+  template 
+  constexpr increasing_allocator(const increasing_allocator& other) 
noexcept : min_elements(other.min_elements) {}
+
+  constexpr std::allocation_result allocate_at_least(std::size_t n) {
+if (n < min_elements)
+  n = min_elements;
+min_elements += 1000;
+return std::allocator{}.allocate_at_least(n);
+  }
+  constexpr T* allocate(std::size_t n) { return allocate_at_least(n).ptr; }
+  constexpr void deallocate(T* p, std::size_t n) noexcept { 
std::allocator{}.deallocate(p, n); }
+};
+
+template 
+bool operator==(increasing_allocator, increasing_allocator) {
+  return true;
+}
+
+// https://github.com/llvm/llvm-project/issues/95161
+constexpr bool test_increasing_allocator() {
+  std::vector> v;
+  v.push_back(1);
+  std::size_t capacity = v.capacity();
+  v.shrink_to_fit();
+  assert(v.capacity() <= capacity);
+  assert(v.size() == 1);
+
+  return true;
+}
+#endif // TEST_STD_VER >= 23
+
 int main(int, char**)
 {
-tests();
+  tests();
 #if TEST_STD_VER > 17
 static_assert(tests());
 #endif
+#if TEST_STD_VER >= 23
+test_increasing_allocator();
+static_assert(test_increasing_allocator());
+#endif // TEST_STD_VER >= 23
+
 return 0;
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100145
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@ldionne What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100145
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][vector] Tests shrink_to_fit requirement. (#98009) (PR #100145)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport c2e4386

Requested by: @ldionne

---
Full diff: https://github.com/llvm/llvm-project/pull/100145.diff


1 Files Affected:

- (modified) 
libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp (+44-1) 


``diff
diff --git 
a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp 
b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
index b39245cab7bf4..f8bcee31964bb 100644
--- a/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
+++ b/libcxx/test/std/containers/sequences/vector.bool/shrink_to_fit.pass.cpp
@@ -39,11 +39,54 @@ TEST_CONSTEXPR_CXX20 bool tests()
 return true;
 }
 
+#if TEST_STD_VER >= 23
+template 
+struct increasing_allocator {
+  using value_type = T;
+  std::size_t min_elements = 1000;
+  increasing_allocator()   = default;
+
+  template 
+  constexpr increasing_allocator(const increasing_allocator& other) 
noexcept : min_elements(other.min_elements) {}
+
+  constexpr std::allocation_result allocate_at_least(std::size_t n) {
+if (n < min_elements)
+  n = min_elements;
+min_elements += 1000;
+return std::allocator{}.allocate_at_least(n);
+  }
+  constexpr T* allocate(std::size_t n) { return allocate_at_least(n).ptr; }
+  constexpr void deallocate(T* p, std::size_t n) noexcept { 
std::allocator{}.deallocate(p, n); }
+};
+
+template 
+bool operator==(increasing_allocator, increasing_allocator) {
+  return true;
+}
+
+// https://github.com/llvm/llvm-project/issues/95161
+constexpr bool test_increasing_allocator() {
+  std::vector> v;
+  v.push_back(1);
+  std::size_t capacity = v.capacity();
+  v.shrink_to_fit();
+  assert(v.capacity() <= capacity);
+  assert(v.size() == 1);
+
+  return true;
+}
+#endif // TEST_STD_VER >= 23
+
 int main(int, char**)
 {
-tests();
+  tests();
 #if TEST_STD_VER > 17
 static_assert(tests());
 #endif
+#if TEST_STD_VER >= 23
+test_increasing_allocator();
+static_assert(test_increasing_allocator());
+#endif // TEST_STD_VER >= 23
+
 return 0;
 }

``




https://github.com/llvm/llvm-project/pull/100145
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100149
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100149

Backport d0ca9f2

Requested by: @ldionne

>From c264db19ef5ac5de02596ebb8ff3774394c871b4 Mon Sep 17 00:00:00 2001
From: Mark de Wever 
Date: Tue, 23 Jul 2024 18:13:22 +0200
Subject: [PATCH] [libc++][string] Fixes shrink_to_fit. (#97961)

This ensures that shrink_to_fit does not increase the allocated size.

Partly addresses #95161

(cherry picked from commit d0ca9f23e8f25b0509c3ff34ed215508b39ea6e7)
---
 libcxx/include/string | 17 ++--
 .../string.capacity/shrink_to_fit.pass.cpp| 41 +++
 2 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/libcxx/include/string b/libcxx/include/string
index ba86a32090825..9fa979e3a5178 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -3358,23 +3358,34 @@ basic_string<_CharT, _Traits, 
_Allocator>::__shrink_or_extend(size_type __target
 __p= __get_long_pointer();
   } else {
 if (__target_capacity > __cap) {
+  // Extend
+  // - called from reserve should propagate the exception thrown.
   auto __allocation = std::__allocate_at_least(__alloc(), 
__target_capacity + 1);
   __new_data= __allocation.ptr;
   __target_capacity = __allocation.count - 1;
 } else {
+  // Shrink
+  // - called from shrink_to_fit should not throw.
+  // - called from reserve may throw but is not required to.
 #ifndef _LIBCPP_HAS_NO_EXCEPTIONS
   try {
 #endif // _LIBCPP_HAS_NO_EXCEPTIONS
 auto __allocation = std::__allocate_at_least(__alloc(), 
__target_capacity + 1);
+
+// The Standard mandates shrink_to_fit() does not increase the 
capacity.
+// With equal capacity keep the existing buffer. This avoids extra work
+// due to swapping the elements.
+if (__allocation.count - 1 > __target_capacity) {
+  __alloc_traits::deallocate(__alloc(), __allocation.ptr, 
__allocation.count);
+  __annotate_new(__sz); // Undoes the __annotate_delete()
+  return;
+}
 __new_data= __allocation.ptr;
 __target_capacity = __allocation.count - 1;
 #ifndef _LIBCPP_HAS_NO_EXCEPTIONS
   } catch (...) {
 return;
   }
-#else  // _LIBCPP_HAS_NO_EXCEPTIONS
-  if (__new_data == nullptr)
-return;
 #endif // _LIBCPP_HAS_NO_EXCEPTIONS
 }
 __begin_lifetime(__new_data, __target_capacity + 1);
diff --git 
a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp 
b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
index 057050cdcf7fa..6f5e43d1341f5 100644
--- 
a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
+++ 
b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
@@ -63,8 +63,49 @@ TEST_CONSTEXPR_CXX20 bool test() {
   return true;
 }
 
+#if TEST_STD_VER >= 23
+std::size_t min_bytes = 1000;
+
+template 
+struct increasing_allocator {
+  using value_type   = T;
+  increasing_allocator() = default;
+  template 
+  increasing_allocator(const increasing_allocator&) noexcept {}
+  std::allocation_result allocate_at_least(std::size_t n) {
+std::size_t allocation_amount = n * sizeof(T);
+if (allocation_amount < min_bytes)
+  allocation_amount = min_bytes;
+min_bytes += 1000;
+return {static_cast(::operator new(allocation_amount)), 
allocation_amount / sizeof(T)};
+  }
+  T* allocate(std::size_t n) { return allocate_at_least(n).ptr; }
+  void deallocate(T* p, std::size_t) noexcept { ::operator 
delete(static_cast(p)); }
+};
+
+template 
+bool operator==(increasing_allocator, increasing_allocator) {
+  return true;
+}
+
+// https://github.com/llvm/llvm-project/issues/95161
+void test_increasing_allocator() {
+  std::basic_string, increasing_allocator> 
s{
+  "String does not fit in the internal buffer"};
+  std::size_t capacity = s.capacity();
+  std::size_t size = s.size();
+  s.shrink_to_fit();
+  assert(s.capacity() <= capacity);
+  assert(s.size() == size);
+  LIBCPP_ASSERT(is_string_asan_correct(s));
+}
+#endif // TEST_STD_VER >= 23
+
 int main(int, char**) {
   test();
+#if TEST_STD_VER >= 23
+  test_increasing_allocator();
+#endif
 #if TEST_STD_VER > 17
   static_assert(test());
 #endif

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@ldionne What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100149
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/19.x: [libc++][string] Fixes shrink_to_fit. (#97961) (PR #100149)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport d0ca9f2

Requested by: @ldionne

---
Full diff: https://github.com/llvm/llvm-project/pull/100149.diff


2 Files Affected:

- (modified) libcxx/include/string (+14-3) 
- (modified) 
libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp 
(+41) 


``diff
diff --git a/libcxx/include/string b/libcxx/include/string
index ba86a32090825..9fa979e3a5178 100644
--- a/libcxx/include/string
+++ b/libcxx/include/string
@@ -3358,23 +3358,34 @@ basic_string<_CharT, _Traits, 
_Allocator>::__shrink_or_extend(size_type __target
 __p= __get_long_pointer();
   } else {
 if (__target_capacity > __cap) {
+  // Extend
+  // - called from reserve should propagate the exception thrown.
   auto __allocation = std::__allocate_at_least(__alloc(), 
__target_capacity + 1);
   __new_data= __allocation.ptr;
   __target_capacity = __allocation.count - 1;
 } else {
+  // Shrink
+  // - called from shrink_to_fit should not throw.
+  // - called from reserve may throw but is not required to.
 #ifndef _LIBCPP_HAS_NO_EXCEPTIONS
   try {
 #endif // _LIBCPP_HAS_NO_EXCEPTIONS
 auto __allocation = std::__allocate_at_least(__alloc(), 
__target_capacity + 1);
+
+// The Standard mandates shrink_to_fit() does not increase the 
capacity.
+// With equal capacity keep the existing buffer. This avoids extra work
+// due to swapping the elements.
+if (__allocation.count - 1 > __target_capacity) {
+  __alloc_traits::deallocate(__alloc(), __allocation.ptr, 
__allocation.count);
+  __annotate_new(__sz); // Undoes the __annotate_delete()
+  return;
+}
 __new_data= __allocation.ptr;
 __target_capacity = __allocation.count - 1;
 #ifndef _LIBCPP_HAS_NO_EXCEPTIONS
   } catch (...) {
 return;
   }
-#else  // _LIBCPP_HAS_NO_EXCEPTIONS
-  if (__new_data == nullptr)
-return;
 #endif // _LIBCPP_HAS_NO_EXCEPTIONS
 }
 __begin_lifetime(__new_data, __target_capacity + 1);
diff --git 
a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp 
b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
index 057050cdcf7fa..6f5e43d1341f5 100644
--- 
a/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
+++ 
b/libcxx/test/std/strings/basic.string/string.capacity/shrink_to_fit.pass.cpp
@@ -63,8 +63,49 @@ TEST_CONSTEXPR_CXX20 bool test() {
   return true;
 }
 
+#if TEST_STD_VER >= 23
+std::size_t min_bytes = 1000;
+
+template 
+struct increasing_allocator {
+  using value_type   = T;
+  increasing_allocator() = default;
+  template 
+  increasing_allocator(const increasing_allocator&) noexcept {}
+  std::allocation_result allocate_at_least(std::size_t n) {
+std::size_t allocation_amount = n * sizeof(T);
+if (allocation_amount < min_bytes)
+  allocation_amount = min_bytes;
+min_bytes += 1000;
+return {static_cast(::operator new(allocation_amount)), 
allocation_amount / sizeof(T)};
+  }
+  T* allocate(std::size_t n) { return allocate_at_least(n).ptr; }
+  void deallocate(T* p, std::size_t) noexcept { ::operator 
delete(static_cast(p)); }
+};
+
+template 
+bool operator==(increasing_allocator, increasing_allocator) {
+  return true;
+}
+
+// https://github.com/llvm/llvm-project/issues/95161
+void test_increasing_allocator() {
+  std::basic_string, increasing_allocator> 
s{
+  "String does not fit in the internal buffer"};
+  std::size_t capacity = s.capacity();
+  std::size_t size = s.size();
+  s.shrink_to_fit();
+  assert(s.capacity() <= capacity);
+  assert(s.size() == size);
+  LIBCPP_ASSERT(is_string_asan_correct(s));
+}
+#endif // TEST_STD_VER >= 23
+
 int main(int, char**) {
   test();
+#if TEST_STD_VER >= 23
+  test_increasing_allocator();
+#endif
 #if TEST_STD_VER > 17
   static_assert(test());
 #endif

``




https://github.com/llvm/llvm-project/pull/100149
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100151

Backport 1df4d86

Requested by: @daltenty

>From 79ddb123bdbf8300c49e4b2abc74b664af833ea9 Mon Sep 17 00:00:00 2001
From: azhan92 
Date: Tue, 23 Jul 2024 09:49:41 -0400
Subject: [PATCH] [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511)

This PR adds support for -mcpu=pwr11/power11 and -mtune=pwr11/power11 in
clang and llvm.

(cherry picked from commit 1df4d866cca51eeab8f012a97cc50957b45971fe)
---
 clang/lib/Basic/Targets/PPC.cpp   | 39 ---
 clang/lib/Basic/Targets/PPC.h | 19 ++---
 clang/lib/Driver/ToolChains/Arch/PPC.cpp  |  3 ++
 clang/test/Misc/target-invalid-cpu-note.c |  2 +-
 clang/test/Preprocessor/init-ppc64.c  | 22 +++
 llvm/lib/Target/PowerPC/PPC.td| 20 --
 llvm/lib/Target/PowerPC/PPCISelLowering.cpp   |  3 ++
 llvm/lib/Target/PowerPC/PPCInstrInfo.cpp  |  1 +
 llvm/lib/Target/PowerPC/PPCSubtarget.h|  1 +
 .../Target/PowerPC/PPCTargetTransformInfo.cpp |  4 +-
 llvm/lib/TargetParser/Host.cpp|  7 
 llvm/test/CodeGen/PowerPC/check-cpu.ll|  6 ++-
 llvm/test/CodeGen/PowerPC/mma-acc-spill.ll|  7 
 ...{p10-constants.ll => p10-p11-constants.ll} | 12 +-
 llvm/unittests/TargetParser/Host.cpp  |  1 +
 15 files changed, 120 insertions(+), 27 deletions(-)
 rename llvm/test/CodeGen/PowerPC/{p10-constants.ll => p10-p11-constants.ll} 
(94%)

diff --git a/clang/lib/Basic/Targets/PPC.cpp b/clang/lib/Basic/Targets/PPC.cpp
index 4ba4a49311d36..9ff54083c923b 100644
--- a/clang/lib/Basic/Targets/PPC.cpp
+++ b/clang/lib/Basic/Targets/PPC.cpp
@@ -385,6 +385,8 @@ void PPCTargetInfo::getTargetDefines(const LangOptions 
&Opts,
 Builder.defineMacro("_ARCH_PWR9");
   if (ArchDefs & ArchDefinePwr10)
 Builder.defineMacro("_ARCH_PWR10");
+  if (ArchDefs & ArchDefinePwr11)
+Builder.defineMacro("_ARCH_PWR11");
   if (ArchDefs & ArchDefineA2)
 Builder.defineMacro("_ARCH_A2");
   if (ArchDefs & ArchDefineE500)
@@ -622,10 +624,17 @@ bool PPCTargetInfo::initFeatureMap(
 addP10SpecificFeatures(Features);
   }
 
-  // Future CPU should include all of the features of Power 10 as well as any
+  // Power11 includes all the same features as Power10 plus any features
+  // specific to the Power11 core.
+  if (CPU == "pwr11" || CPU == "power11") {
+initFeatureMap(Features, Diags, "pwr10", FeaturesVec);
+addP11SpecificFeatures(Features);
+  }
+
+  // Future CPU should include all of the features of Power 11 as well as any
   // additional features (yet to be determined) specific to it.
   if (CPU == "future") {
-initFeatureMap(Features, Diags, "pwr10", FeaturesVec);
+initFeatureMap(Features, Diags, "pwr11", FeaturesVec);
 addFutureSpecificFeatures(Features);
   }
 
@@ -696,6 +705,10 @@ void PPCTargetInfo::addP10SpecificFeatures(
   Features["isa-v31-instructions"] = true;
 }
 
+// Add any Power11 specific features.
+void PPCTargetInfo::addP11SpecificFeatures(
+llvm::StringMap &Features) const {}
+
 // Add features specific to the "Future" CPU.
 void PPCTargetInfo::addFutureSpecificFeatures(
 llvm::StringMap &Features) const {}
@@ -870,17 +883,17 @@ ArrayRef 
PPCTargetInfo::getGCCAddlRegNames() const {
 }
 
 static constexpr llvm::StringLiteral ValidCPUNames[] = {
-{"generic"}, {"440"}, {"450"},{"601"},   {"602"},
-{"603"}, {"603e"},{"603ev"},  {"604"},   {"604e"},
-{"620"}, {"630"}, {"g3"}, {"7400"},  {"g4"},
-{"7450"},{"g4+"}, {"750"},{"8548"},  {"970"},
-{"g5"},  {"a2"},  {"e500"},   {"e500mc"},{"e5500"},
-{"power3"},  {"pwr3"},{"power4"}, {"pwr4"},  {"power5"},
-{"pwr5"},{"power5x"}, {"pwr5x"},  {"power6"},{"pwr6"},
-{"power6x"}, {"pwr6x"},   {"power7"}, {"pwr7"},  {"power8"},
-{"pwr8"},{"power9"},  {"pwr9"},   {"power10"},   {"pwr10"},
-{"powerpc"}, {"ppc"}, {"ppc32"},  {"powerpc64"}, {"ppc64"},
-{"powerpc64le"}, {"ppc64le"}, {"future"}};
+{"generic"},   {"440"}, {"450"}, {"601"}, {"602"},
+{"603"},   {"603e"},{"603ev"},   {"604"}, {"604e"},
+{"620"},   {"630"}, {"g3"},  {"7400"},{"g4"},
+{"7450"},  {"g4+"}, {"750"}, {"8548"},{"970"},
+{"g5"},{"a2"},  {"e500"},{"e500mc"},  {"e5500"},
+{"power3"},{"pwr3"},{"power4"},  {"pwr4"},{"power5"},
+{"pwr5"},  {"power5x"}, {"pwr5x"},   {"power6"},  {"pwr6"},
+{"power6x"},   {"pwr6x"},   {"power7"},  {"pwr7"},{"power8"},
+{"pwr8"},  {"power9"},  {"pwr9"},{"power10"}, {"pwr10"},
+{"power11"},   {"pwr11"},   {"powerpc"}, {"ppc"}, {"ppc32"},
+{"powerpc64"}, {"ppc64"},   {"powerpc64le"}, {"ppc64le"}, {"future"}};
 
 bool PPCTargetInfo::isValidCPUNa

[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@azhan92 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-powerpc

@llvm/pr-subscribers-clang-driver

Author: None (llvmbot)


Changes

Backport 1df4d86

Requested by: @daltenty

---

Patch is 20.39 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/100151.diff


15 Files Affected:

- (modified) clang/lib/Basic/Targets/PPC.cpp (+26-13) 
- (modified) clang/lib/Basic/Targets/PPC.h (+13-6) 
- (modified) clang/lib/Driver/ToolChains/Arch/PPC.cpp (+3) 
- (modified) clang/test/Misc/target-invalid-cpu-note.c (+1-1) 
- (modified) clang/test/Preprocessor/init-ppc64.c (+22) 
- (modified) llvm/lib/Target/PowerPC/PPC.td (+17-3) 
- (modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+3) 
- (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+1) 
- (modified) llvm/lib/Target/PowerPC/PPCSubtarget.h (+1) 
- (modified) llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp (+2-2) 
- (modified) llvm/lib/TargetParser/Host.cpp (+7) 
- (modified) llvm/test/CodeGen/PowerPC/check-cpu.ll (+5-1) 
- (modified) llvm/test/CodeGen/PowerPC/mma-acc-spill.ll (+7) 
- (renamed) llvm/test/CodeGen/PowerPC/p10-p11-constants.ll (+11-1) 
- (modified) llvm/unittests/TargetParser/Host.cpp (+1) 


``diff
diff --git a/clang/lib/Basic/Targets/PPC.cpp b/clang/lib/Basic/Targets/PPC.cpp
index 4ba4a49311d36..9ff54083c923b 100644
--- a/clang/lib/Basic/Targets/PPC.cpp
+++ b/clang/lib/Basic/Targets/PPC.cpp
@@ -385,6 +385,8 @@ void PPCTargetInfo::getTargetDefines(const LangOptions 
&Opts,
 Builder.defineMacro("_ARCH_PWR9");
   if (ArchDefs & ArchDefinePwr10)
 Builder.defineMacro("_ARCH_PWR10");
+  if (ArchDefs & ArchDefinePwr11)
+Builder.defineMacro("_ARCH_PWR11");
   if (ArchDefs & ArchDefineA2)
 Builder.defineMacro("_ARCH_A2");
   if (ArchDefs & ArchDefineE500)
@@ -622,10 +624,17 @@ bool PPCTargetInfo::initFeatureMap(
 addP10SpecificFeatures(Features);
   }
 
-  // Future CPU should include all of the features of Power 10 as well as any
+  // Power11 includes all the same features as Power10 plus any features
+  // specific to the Power11 core.
+  if (CPU == "pwr11" || CPU == "power11") {
+initFeatureMap(Features, Diags, "pwr10", FeaturesVec);
+addP11SpecificFeatures(Features);
+  }
+
+  // Future CPU should include all of the features of Power 11 as well as any
   // additional features (yet to be determined) specific to it.
   if (CPU == "future") {
-initFeatureMap(Features, Diags, "pwr10", FeaturesVec);
+initFeatureMap(Features, Diags, "pwr11", FeaturesVec);
 addFutureSpecificFeatures(Features);
   }
 
@@ -696,6 +705,10 @@ void PPCTargetInfo::addP10SpecificFeatures(
   Features["isa-v31-instructions"] = true;
 }
 
+// Add any Power11 specific features.
+void PPCTargetInfo::addP11SpecificFeatures(
+llvm::StringMap &Features) const {}
+
 // Add features specific to the "Future" CPU.
 void PPCTargetInfo::addFutureSpecificFeatures(
 llvm::StringMap &Features) const {}
@@ -870,17 +883,17 @@ ArrayRef 
PPCTargetInfo::getGCCAddlRegNames() const {
 }
 
 static constexpr llvm::StringLiteral ValidCPUNames[] = {
-{"generic"}, {"440"}, {"450"},{"601"},   {"602"},
-{"603"}, {"603e"},{"603ev"},  {"604"},   {"604e"},
-{"620"}, {"630"}, {"g3"}, {"7400"},  {"g4"},
-{"7450"},{"g4+"}, {"750"},{"8548"},  {"970"},
-{"g5"},  {"a2"},  {"e500"},   {"e500mc"},{"e5500"},
-{"power3"},  {"pwr3"},{"power4"}, {"pwr4"},  {"power5"},
-{"pwr5"},{"power5x"}, {"pwr5x"},  {"power6"},{"pwr6"},
-{"power6x"}, {"pwr6x"},   {"power7"}, {"pwr7"},  {"power8"},
-{"pwr8"},{"power9"},  {"pwr9"},   {"power10"},   {"pwr10"},
-{"powerpc"}, {"ppc"}, {"ppc32"},  {"powerpc64"}, {"ppc64"},
-{"powerpc64le"}, {"ppc64le"}, {"future"}};
+{"generic"},   {"440"}, {"450"}, {"601"}, {"602"},
+{"603"},   {"603e"},{"603ev"},   {"604"}, {"604e"},
+{"620"},   {"630"}, {"g3"},  {"7400"},{"g4"},
+{"7450"},  {"g4+"}, {"750"}, {"8548"},{"970"},
+{"g5"},{"a2"},  {"e500"},{"e500mc"},  {"e5500"},
+{"power3"},{"pwr3"},{"power4"},  {"pwr4"},{"power5"},
+{"pwr5"},  {"power5x"}, {"pwr5x"},   {"power6"},  {"pwr6"},
+{"power6x"},   {"pwr6x"},   {"power7"},  {"pwr7"},{"power8"},
+{"pwr8"},  {"power9"},  {"pwr9"},{"power10"}, {"pwr10"},
+{"power11"},   {"pwr11"},   {"powerpc"}, {"ppc"}, {"ppc32"},
+{"powerpc64"}, {"ppc64"},   {"powerpc64le"}, {"ppc64le"}, {"future"}};
 
 bool PPCTargetInfo::isValidCPUName(StringRef Name) const {
   return llvm::is_contained(ValidCPUNames, Name);
diff --git a/clang/lib/Basic/Targets/PPC.h b/clang/lib/Basic/Targets/PPC.h
index b15ab6fbcf492..6d5d8dd54d013 100644
--- a/clang/lib/Basic/Targets/PPC.h
+++ b/clang/lib/Basic/Targets/PPC.h
@@ -44,8 +44,9 @@ cla

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add omp.target_triples attribute to the OffloadModuleInterface (PR #100154)

2024-07-23 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/100154

The `OffloadModuleInterface` holds getter/setter methods to access OpenMP 
dialect module-level discardable attributes used to hold general OpenMP 
compilation information.

This patch adds the `omp.target_triples` attribute, which is intended to hold 
the list of offloading target triples linked to the host module in which it 
appears. This attribute should be empty when `omp.is_target_device=true`.

>From 3dbb22595bcf691a619483ea51b3620a9de87263 Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 23 Jul 2024 16:32:16 +0100
Subject: [PATCH] [MLIR][OpenMP] Add omp.target_triples attribute to the
 OffloadModuleInterface

The `OffloadModuleInterface` holds getter/setter methods to access OpenMP
dialect module-level discardable attributes used to hold general OpenMP
compilation information.

This patch adds the `omp.target_triples` attribute, which is intended to hold
the list of offloading target triples linked to the host module in which it
appears. This attribute should be empty when `omp.is_target_device=true`.
---
 .../Dialect/OpenMP/OpenMPOpsInterfaces.td | 28 +++
 1 file changed, 28 insertions(+)

diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td 
b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
index 385aa8b1b016a..9e62dcd9253d6 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOpsInterfaces.td
@@ -351,6 +351,34 @@ def OffloadModuleInterface : 
OpInterface<"OffloadModuleInterface"> {
   (ins "::mlir::omp::ClauseRequires":$clauses), [{}], [{
 $_op->setAttr(mlir::StringAttr::get($_op->getContext(), 
"omp.requires"),
   mlir::omp::ClauseRequiresAttr::get($_op->getContext(), clauses));
+  }]>,
+InterfaceMethod<
+  /*description=*/[{
+Get the omp.target_triples attribute on the operator if it's present 
and
+return its value. If it doesn't exist, return an empty array by 
default.
+  }],
+  /*retTy=*/"::llvm::ArrayRef<::mlir::Attribute>",
+  /*methodName=*/"getTargetTriples",
+  (ins), [{}], [{
+if (Attribute triplesAttr = $_op->getAttr("omp.target_triples"))
+  if (auto triples = ::llvm::dyn_cast<::mlir::ArrayAttr>(triplesAttr))
+return triples.getValue();
+return {};
+  }]>,
+InterfaceMethod<
+  /*description=*/[{
+Set the omp.target_triples attribute on the operation.
+  }],
+  /*retTy=*/"void",
+  /*methodName=*/"setTargetTriples",
+  (ins "::llvm::ArrayRef<::std::string>":$targetTriples), [{}], [{
+auto names = ::llvm::to_vector(::llvm::map_range(
+targetTriples, [&](::std::string str) -> ::mlir::Attribute {
+  return mlir::StringAttr::get($_op->getContext(), str);
+}));
+$_op->setAttr(
+::mlir::StringAttr::get($_op->getContext(), "omp.target_triples"),
+::mlir::ArrayAttr::get($_op->getContext(), names));
   }]>
   ];
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang][OpenMP] Add frontend support for -fopenmp-targets (PR #100155)

2024-07-23 Thread Sergio Afonso via llvm-branch-commits

https://github.com/skatrak created 
https://github.com/llvm/llvm-project/pull/100155

This patch adds support for the `-fopenmp-targets` option to the `bbc` and 
`flang -fc1` tools. It adds an `OMPTargetTriples` property to the `LangOptions` 
structure, which is filled with the triples represented by the compiler option.

This is used to initialize the `omp.target_triples` module attribute for later 
use by lowering stages.

>From 54e52e8a37fd725976e157cd0f9e0221a355dead Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Tue, 23 Jul 2024 16:40:18 +0100
Subject: [PATCH] [Flang][OpenMP] Add frontend support for -fopenmp-targets

This patch adds support for the `-fopenmp-targets` option to the `bbc` and
`flang -fc1` tools. It adds an `OMPTargetTriples` property to the `LangOptions`
structure, which is filled with the triples represented by the compiler option.

This is used to initialize the `omp.target_triples` module attribute for later
use by lowering stages.
---
 flang/include/flang/Frontend/LangOptions.h   |  6 
 flang/include/flang/Tools/CrossToolHelpers.h | 14 ++--
 flang/lib/Frontend/CompilerInvocation.cpp| 35 
 flang/test/Lower/OpenMP/offload-targets.f90  | 10 ++
 flang/tools/bbc/bbc.cpp  | 13 +++-
 5 files changed, 74 insertions(+), 4 deletions(-)
 create mode 100644 flang/test/Lower/OpenMP/offload-targets.f90

diff --git a/flang/include/flang/Frontend/LangOptions.h 
b/flang/include/flang/Frontend/LangOptions.h
index 7ab2195818863..57d86d46df5ab 100644
--- a/flang/include/flang/Frontend/LangOptions.h
+++ b/flang/include/flang/Frontend/LangOptions.h
@@ -16,6 +16,9 @@
 #define FORTRAN_FRONTEND_LANGOPTIONS_H
 
 #include 
+#include 
+
+#include "llvm/TargetParser/Triple.h"
 
 namespace Fortran::frontend {
 
@@ -58,6 +61,9 @@ class LangOptions : public LangOptionsBase {
   /// host code generation.
   std::string OMPHostIRFile;
 
+  /// List of triples passed in using -fopenmp-targets.
+  std::vector OMPTargetTriples;
+
   LangOptions();
 };
 
diff --git a/flang/include/flang/Tools/CrossToolHelpers.h 
b/flang/include/flang/Tools/CrossToolHelpers.h
index 1d890fd8e1f6f..75fd783af237d 100644
--- a/flang/include/flang/Tools/CrossToolHelpers.h
+++ b/flang/include/flang/Tools/CrossToolHelpers.h
@@ -131,7 +131,9 @@ struct OffloadModuleOpts {
   bool OpenMPThreadSubscription, bool OpenMPNoThreadState,
   bool OpenMPNoNestedParallelism, bool OpenMPIsTargetDevice,
   bool OpenMPIsGPU, bool OpenMPForceUSM, uint32_t OpenMPVersion,
-  std::string OMPHostIRFile = {}, bool NoGPULib = false)
+  std::string OMPHostIRFile = {},
+  const std::vector &OMPTargetTriples = {},
+  bool NoGPULib = false)
   : OpenMPTargetDebug(OpenMPTargetDebug),
 OpenMPTeamSubscription(OpenMPTeamSubscription),
 OpenMPThreadSubscription(OpenMPThreadSubscription),
@@ -139,7 +141,9 @@ struct OffloadModuleOpts {
 OpenMPNoNestedParallelism(OpenMPNoNestedParallelism),
 OpenMPIsTargetDevice(OpenMPIsTargetDevice), OpenMPIsGPU(OpenMPIsGPU),
 OpenMPForceUSM(OpenMPForceUSM), OpenMPVersion(OpenMPVersion),
-OMPHostIRFile(OMPHostIRFile), NoGPULib(NoGPULib) {}
+OMPHostIRFile(OMPHostIRFile),
+OMPTargetTriples(OMPTargetTriples.begin(), OMPTargetTriples.end()),
+NoGPULib(NoGPULib) {}
 
   OffloadModuleOpts(Fortran::frontend::LangOptions &Opts)
   : OpenMPTargetDebug(Opts.OpenMPTargetDebug),
@@ -150,7 +154,7 @@ struct OffloadModuleOpts {
 OpenMPIsTargetDevice(Opts.OpenMPIsTargetDevice),
 OpenMPIsGPU(Opts.OpenMPIsGPU), OpenMPForceUSM(Opts.OpenMPForceUSM),
 OpenMPVersion(Opts.OpenMPVersion), OMPHostIRFile(Opts.OMPHostIRFile),
-NoGPULib(Opts.NoGPULib) {}
+OMPTargetTriples(Opts.OMPTargetTriples), NoGPULib(Opts.NoGPULib) {}
 
   uint32_t OpenMPTargetDebug = 0;
   bool OpenMPTeamSubscription = false;
@@ -162,6 +166,7 @@ struct OffloadModuleOpts {
   bool OpenMPForceUSM = false;
   uint32_t OpenMPVersion = 11;
   std::string OMPHostIRFile = {};
+  std::vector OMPTargetTriples = {};
   bool NoGPULib = false;
 };
 
@@ -185,6 +190,9 @@ struct OffloadModuleOpts {
   if (!Opts.OMPHostIRFile.empty())
 offloadMod.setHostIRFilePath(Opts.OMPHostIRFile);
 }
+auto strTriples = llvm::to_vector(llvm::map_range(Opts.OMPTargetTriples,
+[](llvm::Triple triple) { return triple.normalize(); }));
+offloadMod.setTargetTriples(strTriples);
   }
 }
 
diff --git a/flang/lib/Frontend/CompilerInvocation.cpp 
b/flang/lib/Frontend/CompilerInvocation.cpp
index 8c892d9d032e1..19f067a135dd6 100644
--- a/flang/lib/Frontend/CompilerInvocation.cpp
+++ b/flang/lib/Frontend/CompilerInvocation.cpp
@@ -894,6 +894,7 @@ static bool parseDiagArgs(CompilerInvocation &res, 
llvm::opt::ArgList &args,
 /// options accordingly. Returns false if new errors are generated.
 static bool parseDialectArgs(CompilerInvocation &res, llvm::opt::ArgList &args,
   

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96872

>From 8e3dfc335301d978d3d22110a6db8f98fc636b4d Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
 __builtin_amdgcn_global_atomic_fadd_{f32|f64}

Need to emit syncscope and new metadata to get the native instruction,
most of the time.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 39 +--
 .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl|  2 +-
 .../builtins-fp-atomics-gfx12.cl  |  4 +-
 .../builtins-fp-atomics-gfx90a.cl |  4 +-
 .../builtins-fp-atomics-gfx940.cl |  4 +-
 5 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index c199976956085..00f581dced900 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -58,6 +58,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/ScopedPrinter.h"
@@ -18790,8 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
@@ -18803,18 +18802,11 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getHalfTy(getLLVMContext()), 2);
   IID = Intrinsic::amdgcn_global_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -19237,7 +19229,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
   case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
-  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
+  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19253,6 +19247,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19287,8 +19283,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
   AO = AtomicOrdering::SequentiallyConsistent;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
@@ -19303,6 +19304,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID);
 if (Volatile)
   RMW->setVolatile(true);
+
+unsigned AddrSpace = Ptr.getType()->getAddressSpace();
+if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) {
+  // Most targets require "amdgpu.no.fine.grained.memory" to emit the 
nativ

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96873

>From ab196e6375bfa6cda5977102d733c501271cb684 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
 {global|flat}_atomic_fadd_v2f16 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 20 ++-
 .../builtins-fp-atomics-gfx12.cl  |  9 ++---
 .../builtins-fp-atomics-gfx90a.cl |  2 +-
 .../builtins-fp-atomics-gfx940.cl |  3 ++-
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 010fafde0714e..fec4fc4be562d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18791,22 +18791,15 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -18826,11 +18819,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgTy = llvm::Type::getFloatTy(getLLVMContext());
   IID = Intrinsic::amdgcn_flat_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19231,7 +19219,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19249,6 +19239,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 6b8a6d14575db..07e63a8711c7f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr 
%{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX12-LABEL:  test_flat_add_2f16
 // GFX12: flat_atomic_pk_add_f16
 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) {
@@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) {
 }
 
 // CHECK-LABEL: test_global_add_half2
-// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr 
addrspace(1) %{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> 
%{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX12-LABEL:  test_global_add_half2
 // GFX12:  global_atomic_pk_add_f16 v2, v[0:1], v2, off

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96873

>From 37f162186d0d30a0c286efb582af86264c576b5c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
 {global|flat}_atomic_fadd_v2f16 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 20 ++-
 .../builtins-fp-atomics-gfx12.cl  |  9 ++---
 .../builtins-fp-atomics-gfx90a.cl |  2 +-
 .../builtins-fp-atomics-gfx940.cl |  3 ++-
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 010fafde0714e..fec4fc4be562d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18791,22 +18791,15 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -18826,11 +18819,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgTy = llvm::Type::getFloatTy(getLLVMContext());
   IID = Intrinsic::amdgcn_flat_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19231,7 +19219,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19249,6 +19239,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 6b8a6d14575db..07e63a8711c7f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr 
%{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX12-LABEL:  test_flat_add_2f16
 // GFX12: flat_atomic_pk_add_f16
 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) {
@@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) {
 }
 
 // CHECK-LABEL: test_global_add_half2
-// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr 
addrspace(1) %{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> 
%{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX12-LABEL:  test_global_add_half2
 // GFX12:  global_atomic_pk_add_f16 v2, v[0:1], v2, off

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96872

>From f5747ae0c6eb1cb40d13cd99244734996777c65b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
 __builtin_amdgcn_global_atomic_fadd_{f32|f64}

Need to emit syncscope and new metadata to get the native instruction,
most of the time.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 39 +--
 .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl|  2 +-
 .../builtins-fp-atomics-gfx12.cl  |  4 +-
 .../builtins-fp-atomics-gfx90a.cl |  4 +-
 .../builtins-fp-atomics-gfx940.cl |  4 +-
 5 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index c199976956085..00f581dced900 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -58,6 +58,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/ScopedPrinter.h"
@@ -18790,8 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
@@ -18803,18 +18802,11 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getHalfTy(getLLVMContext()), 2);
   IID = Intrinsic::amdgcn_global_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -19237,7 +19229,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
   case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
-  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
+  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19253,6 +19247,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19287,8 +19283,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
   AO = AtomicOrdering::SequentiallyConsistent;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
@@ -19303,6 +19304,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID);
 if (Volatile)
   RMW->setVolatile(true);
+
+unsigned AddrSpace = Ptr.getType()->getAddressSpace();
+if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) {
+  // Most targets require "amdgpu.no.fine.grained.memory" to emit the 
nativ

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (PR #96874)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96874

>From 5944dea0c3f7207ce62a56c2b8806ecf5d53b527 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:15:26 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64}
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp | 17 ++---
 .../CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl |  6 --
 .../CodeGenOpenCL/builtins-fp-atomics-gfx940.cl |  3 ++-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index fec4fc4be562d..309c069d44738 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18793,10 +18793,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
@@ -18806,19 +18804,12 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmin;
   break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19221,7 +19212,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19241,6 +19234,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index cd10777dbe079..02e289427238f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -45,7 +45,8 @@ void test_global_max_f64(__global double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_add_local_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p3.f64(ptr 
addrspace(3) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(3) %{{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8{{$}}
+
 // GFX90A-LABEL:  test_flat_add_local_f64$local
 // GFX90A:  ds_add_rtn_f64
 void test_flat_add_local_f64(__local double *addr, double x){
@@ -54,7 +55,8 @@ void test_flat_add_local_f64(__local double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_global_add_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_flat_global_add_f64$local
 // GFX90A:  global_atomic_add_f64
 void test_flat_global_add_f64(__global double *addr, double x){
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
index 589dcd406630d..bd9b8c7268e06 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
@@ -10,7 +10,8 @@ typedef half  _

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96875

>From a15cfba94245201cbb963ab76c15018c2bc42a61 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:34:43 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 26 ++-
 .../builtins-fp-atomics-gfx12.cl  | 24 -
 .../builtins-fp-atomics-gfx90a.cl |  6 ++---
 .../builtins-fp-atomics-gfx940.cl | 14 +++---
 4 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 309c069d44738..d98fd0012e15a 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18817,22 +18817,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
-Intrinsic::ID IID;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_global_atomic_fadd_v2bf16;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd_v2bf16;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19214,7 +19198,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19236,6 +19222,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19280,7 +19268,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   AO = AtomicOrdering::Monotonic;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
-  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16) {
+  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16) {
 llvm::Type *V2BF16Ty = FixedVectorType::get(
 llvm::Type::getBFloatTy(Builder.getContext()), 2);
 Val = Builder.CreateBitCast(Val, V2BF16Ty);
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 07e63a8711c7f..e8b6eb57c38d7 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -11,7 +11,7 @@ typedef short __attribute__((ext_vector_type(2))) short2;
 
 // CHECK-LABEL: test_local_add_2bf16
 // CHECK: [[BC0:%.+]] = bitcast <2 x i16> {{.+}} to <2 x bfloat>
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x bfloat> 
[[BC0]] syncscope("agent") monotonic, align 4
+// CHECK-NEXT: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x 
bfloat> [[BC0]] syncscope("agent") monotonic, align 4
 // CHECK-NEXT: bitcast <2 x bfloat> [[RMW]] to <2 x i16>
 
 // GFX12-LABEL:  test_local_add_2bf16
@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
 
 // GFX12-LABEL:  test_flat_add_2f

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (PR #96876)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96876

>From 06fb3add7a2292f40b54849c768e20ac76fd1605 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 23:18:32 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max
 f64 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 36 +--
 .../builtins-fp-atomics-gfx90a.cl | 18 ++
 2 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index d98fd0012e15a..675561bd14ad4 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18791,32 +18791,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmax;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmax;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F =
-CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19200,7 +19174,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19227,8 +19205,12 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   BinOp = llvm::AtomicRMWInst::FMin;
   break;
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
 case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   BinOp = llvm::AtomicRMWInst::FMax;
   break;
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index 9381ce951df3e..556e553903d1a 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -27,7 +27,8 @@ void test_global_add_half2(__global half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_global_global_min_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmin.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmin ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_global_min_f64$local
 // GFX90A:  global_atomic_min_f64
 void test_global_global_min_f64(__global double *addr, double x){
@@ -36,7 +37,8 @@ void test_global_global_min_f64(__global double *addr, double 
x){
 }
 
 // CHECK-LABEL: test_global_max_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmax.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmax ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_max_f64$local
 // GFX90A:  global_atomic_max_f64
 void test_global_max_f64(__global double *addr, double x){
@@ -65,7 +67,8 @@ void test_flat_global_add_f64(__global double *addr, doub

[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)

2024-07-23 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/97050

>From fea266d72c82212f8b020614da367908640d3d34 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 27 Jun 2024 16:32:48 +0200
Subject: [PATCH] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics

These are now fully covered by atomicrmw.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   4 -
 llvm/lib/IR/AutoUpgrade.cpp   |  14 +-
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |   2 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   2 -
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 -
 llvm/lib/Target/AMDGPU/FLATInstructions.td|   2 -
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   6 +-
 llvm/test/Bitcode/amdgcn-atomic.ll|  22 ++
 .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll| 106 -
 .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 218 --
 llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll | 193 
 11 files changed, 33 insertions(+), 538 deletions(-)

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index ab2620fdcf6b3..119281ca6103a 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2955,10 +2955,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic {
 def NAME#"_"#kind : AMDGPUMFp8SmfmacIntrinsic;
 }
 
-// bf16 atomics use v2i16 argument since there is no bf16 data type in the 
llvm.
-def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn;
-def int_amdgcn_flat_atomic_fadd_v2bf16   : AMDGPUAtomicRtn;
-
 defset list AMDGPUMFMAIntrinsics940 = {
 def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic;
 def int_amdgcn_mfma_i32_32x32x16_i8 : AMDGPUMfmaIntrinsic;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 53de9eef516b3..f566a0e3c3043 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1034,7 +1034,9 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   }
 
   if (Name.starts_with("ds.fadd") || Name.starts_with("ds.fmin") ||
-  Name.starts_with("ds.fmax")) {
+  Name.starts_with("ds.fmax") ||
+  Name.starts_with("global.atomic.fadd.v2bf16") ||
+  Name.starts_with("flat.atomic.fadd.v2bf16")) {
 // Replaced with atomicrmw fadd/fmin/fmax, so there's no new
 // declaration.
 NewFn = nullptr;
@@ -4042,7 +4044,9 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   .StartsWith("ds.fmin", AtomicRMWInst::FMin)
   .StartsWith("ds.fmax", AtomicRMWInst::FMax)
   .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap)
-  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap);
+  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap)
+  .StartsWith("global.atomic.fadd", AtomicRMWInst::FAdd)
+  .StartsWith("flat.atomic.fadd", AtomicRMWInst::FAdd);
 
   unsigned NumOperands = CI->getNumOperands();
   if (NumOperands < 3) // Malformed bitcode.
@@ -4097,8 +4101,10 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   Builder.CreateAtomicRMW(RMWOp, Ptr, Val, std::nullopt, Order, SSID);
 
   if (PtrTy->getAddressSpace() != 3) {
-RMW->setMetadata("amdgpu.no.fine.grained.memory",
- MDNode::get(F->getContext(), {}));
+MDNode *EmptyMD = MDNode::get(F->getContext(), {});
+RMW->setMetadata("amdgpu.no.fine.grained.memory", EmptyMD);
+if (RMWOp == AtomicRMWInst::FAdd && RetTy->isFloatTy())
+  RMW->setMetadata("amdgpu.ignore.denormal.mode", EmptyMD);
   }
 
   if (IsVolatile)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index c6dbc58395e48..db8b44149cf47 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -620,12 +620,10 @@ multiclass local_addr_space_atomic_op {
 
 defm int_amdgcn_flat_atomic_fadd : noret_op;
 defm int_amdgcn_flat_atomic_fadd : flat_addr_space_atomic_op;
-defm int_amdgcn_flat_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_flat_atomic_fmin : noret_op;
 defm int_amdgcn_flat_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_fadd : global_addr_space_atomic_op;
 defm int_amdgcn_flat_atomic_fadd : global_addr_space_atomic_op;
-defm int_amdgcn_global_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_global_atomic_fmin : noret_op;
 defm int_amdgcn_global_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_csub : noret_op;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index aa329a58547f3..546c0a238e430 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4898,8 +4898,6 @@ AMDGPURegisterBankInfo::getInstrMapping(const 
MachineInstr &MI) const {
 case Intrinsic::amdgcn_flat_atomic_fmax:
 case Intrinsic

[llvm-branch-commits] [libcxx] [libc++][doc] Update the release notes for LLVM 19. (PR #100167)

2024-07-23 Thread Louis Dionne via llvm-branch-commits

https://github.com/ldionne approved this pull request.


https://github.com/llvm/llvm-project/pull/100167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@jhuber6 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100174
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100174
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100174

Backport e0649a5dfc6b859d652318f578bc3d49674787a4

Requested by: @jhuber6

>From 62f7338ac4509a71ce149ab879ed35cc13f5f00f Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 23 Jul 2024 12:54:00 -0500
Subject: [PATCH] [NVPTX] Fix internal indirect call prototypes not obeying the
 ABI (#100131)

Summary:
The NVPTX backend optimizes the ABI for functions that are internal,
however, this is not legal for indirect call prototypes. Previously, we
would modify the ABI on an aggregate byval type passed to an indirect
call prototype, which would make PTXAS error. This patch just passes the
function as a nullptr to force strict ABI compliance without
modification in the helper function.

Fixes https://github.com/llvm/llvm-project/issues/100055

(cherry picked from commit e0649a5dfc6b859d652318f578bc3d49674787a4)
---
 libc/config/gpu/entrypoints.txt | 15 +---
 llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp |  5 +-
 llvm/test/CodeGen/NVPTX/indirect_byval.ll   | 94 +
 3 files changed, 101 insertions(+), 13 deletions(-)
 create mode 100644 llvm/test/CodeGen/NVPTX/indirect_byval.ll

diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 42909cec55890..fa878d8999227 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -1,13 +1,3 @@
-if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)
-  set(extra_entrypoints
-  # stdio.h entrypoints
-  libc.src.stdio.snprintf
-  libc.src.stdio.sprintf
-  libc.src.stdio.vsnprintf
-  libc.src.stdio.vsprintf
-  )
-endif()
-
 set(TARGET_LIBC_ENTRYPOINTS
 # assert.h entrypoints
 libc.src.assert.__assert_fail
@@ -186,13 +176,16 @@ set(TARGET_LIBC_ENTRYPOINTS
 libc.src.errno.errno
 
 # stdio.h entrypoints
-${extra_entrypoints}
 libc.src.stdio.clearerr
 libc.src.stdio.fclose
 libc.src.stdio.printf
 libc.src.stdio.vprintf
 libc.src.stdio.fprintf
 libc.src.stdio.vfprintf
+libc.src.stdio.snprintf
+libc.src.stdio.sprintf
+libc.src.stdio.vsnprintf
+libc.src.stdio.vsprintf
 libc.src.stdio.feof
 libc.src.stdio.ferror
 libc.src.stdio.fflush
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp 
b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 44c1a2e50486c..6975412ce5d35 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -1429,7 +1429,6 @@ std::string NVPTXTargetLowering::getPrototype(
 
   bool first = true;
 
-  const Function *F = CB.getFunction();
   unsigned NumArgs = VAInfo ? VAInfo->first : Args.size();
   for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) {
 Type *Ty = Args[i].Ty;
@@ -1471,10 +1470,12 @@ std::string NVPTXTargetLowering::getPrototype(
   continue;
 }
 
+// Indirect calls need strict ABI alignment so we disable optimizations by
+// not providing a function to optimize.
 Type *ETy = Args[i].IndirectType;
 Align InitialAlign = Outs[OIdx].Flags.getNonZeroByValAlign();
 Align ParamByValAlign =
-getFunctionByValParamAlign(F, ETy, InitialAlign, DL);
+getFunctionByValParamAlign(/*F=*/nullptr, ETy, InitialAlign, DL);
 
 O << ".param .align " << ParamByValAlign.value() << " .b8 ";
 O << "_";
diff --git a/llvm/test/CodeGen/NVPTX/indirect_byval.ll 
b/llvm/test/CodeGen/NVPTX/indirect_byval.ll
new file mode 100644
index 0..ac6c4e262fd60
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/indirect_byval.ll
@@ -0,0 +1,94 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | 
%ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+%struct.S = type { i8 }
+%struct.U = type { i64 }
+
+@ptr = external global ptr, align 8
+
+define internal i32 @foo() {
+; CHECK-LABEL: foo(
+; CHECK:   {
+; CHECK-NEXT:.local .align 1 .b8 __local_depot0[2];
+; CHECK-NEXT:.reg .b64 %SP;
+; CHECK-NEXT:.reg .b64 %SPL;
+; CHECK-NEXT:.reg .b16 %rs<2>;
+; CHECK-NEXT:.reg .b32 %r<3>;
+; CHECK-NEXT:.reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0: // %entry
+; CHECK-NEXT:mov.u64 %SPL, __local_depot0;
+; CHECK-NEXT:cvta.local.u64 %SP, %SPL;
+; CHECK-NEXT:ld.global.u64 %rd1, [ptr];
+; CHECK-NEXT:ld.u8 %rs1, [%SP+1];
+; CHECK-NEXT:add.u64 %rd2, %SP, 0;
+; CHECK-NEXT:{ // callseq 0, 0
+; CHECK-NEXT:.param .align 1 .b8 param0[1];
+; CHECK-NEXT:st.param.b8 [param0+0], %rs1;
+; CHECK-NEXT:.param .b64 param1;
+; CHECK-NEXT:st.param.b64 [param1+0], %rd2;
+; CHECK-NEXT:.param .b32 retval0;
+; CHECK-NEXT:prototype_0 : .callprototype (.param .b32 _) _ (.param .align 
1 .b8 _[1], .param .b64 _);
+; CHECK-NEXT:call (retval0),
+; CHECK-NEXT:%rd1,
+; CHECK-NEXT:   

[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libc

Author: None (llvmbot)


Changes

Backport e0649a5dfc6b859d652318f578bc3d49674787a4

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/100174.diff


3 Files Affected:

- (modified) libc/config/gpu/entrypoints.txt (+4-11) 
- (modified) llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp (+3-2) 
- (added) llvm/test/CodeGen/NVPTX/indirect_byval.ll (+94) 


``diff
diff --git a/libc/config/gpu/entrypoints.txt b/libc/config/gpu/entrypoints.txt
index 42909cec55890..fa878d8999227 100644
--- a/libc/config/gpu/entrypoints.txt
+++ b/libc/config/gpu/entrypoints.txt
@@ -1,13 +1,3 @@
-if(LIBC_TARGET_ARCHITECTURE_IS_AMDGPU)
-  set(extra_entrypoints
-  # stdio.h entrypoints
-  libc.src.stdio.snprintf
-  libc.src.stdio.sprintf
-  libc.src.stdio.vsnprintf
-  libc.src.stdio.vsprintf
-  )
-endif()
-
 set(TARGET_LIBC_ENTRYPOINTS
 # assert.h entrypoints
 libc.src.assert.__assert_fail
@@ -186,13 +176,16 @@ set(TARGET_LIBC_ENTRYPOINTS
 libc.src.errno.errno
 
 # stdio.h entrypoints
-${extra_entrypoints}
 libc.src.stdio.clearerr
 libc.src.stdio.fclose
 libc.src.stdio.printf
 libc.src.stdio.vprintf
 libc.src.stdio.fprintf
 libc.src.stdio.vfprintf
+libc.src.stdio.snprintf
+libc.src.stdio.sprintf
+libc.src.stdio.vsnprintf
+libc.src.stdio.vsprintf
 libc.src.stdio.feof
 libc.src.stdio.ferror
 libc.src.stdio.fflush
diff --git a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp 
b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
index 44c1a2e50486c..6975412ce5d35 100644
--- a/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
@@ -1429,7 +1429,6 @@ std::string NVPTXTargetLowering::getPrototype(
 
   bool first = true;
 
-  const Function *F = CB.getFunction();
   unsigned NumArgs = VAInfo ? VAInfo->first : Args.size();
   for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) {
 Type *Ty = Args[i].Ty;
@@ -1471,10 +1470,12 @@ std::string NVPTXTargetLowering::getPrototype(
   continue;
 }
 
+// Indirect calls need strict ABI alignment so we disable optimizations by
+// not providing a function to optimize.
 Type *ETy = Args[i].IndirectType;
 Align InitialAlign = Outs[OIdx].Flags.getNonZeroByValAlign();
 Align ParamByValAlign =
-getFunctionByValParamAlign(F, ETy, InitialAlign, DL);
+getFunctionByValParamAlign(/*F=*/nullptr, ETy, InitialAlign, DL);
 
 O << ".param .align " << ParamByValAlign.value() << " .b8 ";
 O << "_";
diff --git a/llvm/test/CodeGen/NVPTX/indirect_byval.ll 
b/llvm/test/CodeGen/NVPTX/indirect_byval.ll
new file mode 100644
index 0..ac6c4e262fd60
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/indirect_byval.ll
@@ -0,0 +1,94 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | FileCheck %s
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_52 -mattr=+ptx64 | 
%ptxas-verify %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+%struct.S = type { i8 }
+%struct.U = type { i64 }
+
+@ptr = external global ptr, align 8
+
+define internal i32 @foo() {
+; CHECK-LABEL: foo(
+; CHECK:   {
+; CHECK-NEXT:.local .align 1 .b8 __local_depot0[2];
+; CHECK-NEXT:.reg .b64 %SP;
+; CHECK-NEXT:.reg .b64 %SPL;
+; CHECK-NEXT:.reg .b16 %rs<2>;
+; CHECK-NEXT:.reg .b32 %r<3>;
+; CHECK-NEXT:.reg .b64 %rd<3>;
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0: // %entry
+; CHECK-NEXT:mov.u64 %SPL, __local_depot0;
+; CHECK-NEXT:cvta.local.u64 %SP, %SPL;
+; CHECK-NEXT:ld.global.u64 %rd1, [ptr];
+; CHECK-NEXT:ld.u8 %rs1, [%SP+1];
+; CHECK-NEXT:add.u64 %rd2, %SP, 0;
+; CHECK-NEXT:{ // callseq 0, 0
+; CHECK-NEXT:.param .align 1 .b8 param0[1];
+; CHECK-NEXT:st.param.b8 [param0+0], %rs1;
+; CHECK-NEXT:.param .b64 param1;
+; CHECK-NEXT:st.param.b64 [param1+0], %rd2;
+; CHECK-NEXT:.param .b32 retval0;
+; CHECK-NEXT:prototype_0 : .callprototype (.param .b32 _) _ (.param .align 
1 .b8 _[1], .param .b64 _);
+; CHECK-NEXT:call (retval0),
+; CHECK-NEXT:%rd1,
+; CHECK-NEXT:(
+; CHECK-NEXT:param0,
+; CHECK-NEXT:param1
+; CHECK-NEXT:)
+; CHECK-NEXT:, prototype_0;
+; CHECK-NEXT:ld.param.b32 %r1, [retval0+0];
+; CHECK-NEXT:} // callseq 0
+; CHECK-NEXT:st.param.b32 [func_retval0+0], %r1;
+; CHECK-NEXT:ret;
+entry:
+  %s = alloca %struct.S, align 1
+  %agg.tmp = alloca %struct.S, align 1
+  %0 = load ptr, ptr @ptr, align 8
+  %call = call i32 %0(ptr byval(%struct.S) align 1 %agg.tmp, ptr noundef %s)
+  ret i32 %call
+}
+
+define internal i32 @bar() {
+; CHECK-LABEL: bar(
+; CHECK: // @bar
+; CHECK-NEXT:  {
+; CHECK-NEXT:.local .align 8 .b8 __local_depot1[16];
+; CHECK-NEXT:.reg .b64 %SP;
+; CHECK-NEXT:.reg .b64 %SPL;
+; CHECK-NEXT:.reg .b32 %r<3>;
+; CHECK-NEXT:.reg .b6

[llvm-branch-commits] [libc] [llvm] release/19.x: [NVPTX] Fix internal indirect call prototypes not obeying the ABI (#100131) (PR #100174)

2024-07-23 Thread Joseph Huber via llvm-branch-commits

jhuber6 wrote:

This should be merged

https://github.com/llvm/llvm-project/pull/100174
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/99891

>From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 22 Jul 2024 11:56:23 -0700
Subject: [PATCH 1/3] Changed assignment of profiles with pseudo probe index

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++
 .../X86/match-blocks-with-pseudo-probes.test  | 25 ++
 2 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 4105f626fb5b6..c135ee5ff4837 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -195,11 +195,15 @@ class StaleMatcher {
   void init(const std::vector &Blocks,
 const std::vector &Hashes,
 const std::vector &CallHashes,
-std::optional YamlBFGUID) {
+const std::unordered_map>
+IndexToBinaryPseudoProbes,
+const std::unordered_map
+BinaryPseudoProbeToBlock,
+const uint64_t YamlBFGUID) {
 assert(Blocks.size() == Hashes.size() &&
Hashes.size() == CallHashes.size() &&
"incorrect matcher initialization");
-
 for (size_t I = 0; I < Blocks.size(); I++) {
   FlowBlock *Block = Blocks[I];
   uint16_t OpHash = Hashes[I].OpcodeHash;
@@ -209,6 +213,8 @@ class StaleMatcher {
 std::make_pair(Hashes[I], Block));
   this->Blocks.push_back(Block);
 }
+this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes;
+this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock;
 this->YamlBFGUID = YamlBFGUID;
   }
 
@@ -234,10 +240,14 @@ class StaleMatcher {
   using HashBlockPairType = std::pair;
   std::unordered_map> OpHashToBlocks;
   std::unordered_map> 
CallHashToBlocks;
-  std::vector Blocks;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  std::vector Blocks;
   // If the pseudo probe checksums of the profiled and binary functions are
   // equal, then the YamlBF's GUID is defined and used to match blocks.
-  std::optional YamlBFGUID;
+  uint64_t YamlBFGUID;
 
   // Uses OpcodeHash to find the most similar block for a given hash.
   const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const {
@@ -284,7 +294,7 @@ class StaleMatcher {
 // Searches for the pseudo probe attached to the matched function's block,
 // ignoring pseudo probes attached to function calls and inlined functions'
 // blocks.
-outs() << "match with pseudo probes\n";
+std::vector BlockPseudoProbes;
 for (const auto &PseudoProbe : PseudoProbes) {
   // Ensures that pseudo probe information belongs to the appropriate
   // function and not an inlined function.
@@ -293,11 +303,30 @@ class StaleMatcher {
   // Skips pseudo probes attached to function calls.
   if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
 continue;
-  assert(PseudoProbe.Index < Blocks.size() &&
- "pseudo probe index out of range");
-  return Blocks[PseudoProbe.Index];
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
 }
-return nullptr;
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index < Blocks.size() && "Invalid pseudo probe index");
+
+auto It = IndexToBinaryPseudoProbes.find(Index);
+assert(It != IndexToBinaryPseudoProbes.end() &&
+   "All blocks should have a pseudo probe");
+if (It->second.size() > 1)
+  return nullptr;
+
+const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0];
+auto BinaryPseudoProbeIt = 
BinaryPseudoProbeToBlock.find(BinaryPseudoProbe);
+assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() &&
+   "All binary pseudo probes should belong a binary basic block");
+
+return BinaryPseudoProbeIt->second;
   }
 };
 
@@ -491,6 +520,11 @@ size_t matchWeightsByHashes(
   std::vector CallHashes;
   std::vector Blocks;
   std::vector BlendedHashes;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder();
   for (uint64_t I = 0; I < BlockOrder.size(); I++) {
 const BinaryBasicBlock *BB = BlockOrder[I];
 assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock");
@@ -510,9 +544,27 @@ size_t matchWeightsByHashes(
 Blocks.push_back(&Func.Blocks[I + 1]);
 BlendedBlockHash BlendedHash(BB->getHash());
 BlendedHashes.push_back(BlendedHash);
+if (PseudoProbeDecoder) {
+  const AddressProbesMap &ProbeMap =
+  PseudoProbeDecoder->getAd

[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Amir Ayupov via llvm-branch-commits


@@ -306,26 +310,41 @@ class StaleMatcher {
 
   BlockPseudoProbes.push_back(&PseudoProbe);
 }
-
 // Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
 // probe and binary pseudo probe.
-if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+if (BlockPseudoProbes.size() == 0) {
+  if (opts::Verbosity >= 2)
+errs() << "BOLT-WARNING: no pseudo probes in profile block\n";

aaupov wrote:

Bump verbosity for this logging to >=3.
Add aggregated counters – those could be printed for BF at v>=2.
BC-level aggregated counters can be printed at v>=1.

https://github.com/llvm/llvm-project/pull/99891
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Amir Ayupov via llvm-branch-commits


@@ -555,6 +574,10 @@ size_t matchWeightsByHashes(
ProbeMap.lower_bound(FuncAddr + BlockRange.second));
   for (const auto &[_, Probes] : BlockProbes) {
 for (const MCDecodedPseudoProbe &Probe : Probes) {
+  if (Probe.getInlineTreeNode()->hasInlineSite())

aaupov wrote:

What do we prune with this check? Don't we discard valid probes belonging to 
the current function?

https://github.com/llvm/llvm-project/pull/99891
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100183

Backport e64e745e8fb8

Requested by: @ldionne

>From 0a44617ee7a29a5a7758285c5c367b66d68051a3 Mon Sep 17 00:00:00 2001
From: Louis Dionne 
Date: Tue, 23 Jul 2024 13:04:54 -0500
Subject: [PATCH] [libc++][libc++abi] Minor follow-up changes after ptrauth
 upstreaming (#87481)

This patch applies the comments provided on #84573. This is done as a
separate PR to avoid merge conflicts with downstreams that already had
ptrauth support.

(cherry picked from commit e64e745e8fb802ffb06259b1a5ba3db713a17087)
---
 libcxx/include/typeinfo   |  9 ---
 libcxx/src/include/overridable_function.h |  6 ++---
 libcxxabi/src/private_typeinfo.cpp| 33 +--
 3 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/libcxx/include/typeinfo b/libcxx/include/typeinfo
index d1c0de3c1bfdd..2727cad02fa99 100644
--- a/libcxx/include/typeinfo
+++ b/libcxx/include/typeinfo
@@ -275,13 +275,14 @@ struct __type_info_implementations {
   __impl;
 };
 
-#if defined(__arm64__) && 
__has_cpp_attribute(clang::ptrauth_vtable_pointer)
-#  if __has_feature(ptrauth_type_info_discriminated_vtable_pointer)
+#if __has_cpp_attribute(_Clang::__ptrauth_vtable_pointer__)
+#  if __has_feature(ptrauth_type_info_vtable_pointer_discrimination)
 #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH  
\
-  [[clang::ptrauth_vtable_pointer(process_independent, 
address_discrimination, type_discrimination)]]
+  [[_Clang::__ptrauth_vtable_pointer__(process_independent, 
address_discrimination, type_discrimination)]]
 #  else
 #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH  
\
-  [[clang::ptrauth_vtable_pointer(process_independent, 
no_address_discrimination, no_extra_discrimination)]]
+  [[_Clang::__ptrauth_vtable_pointer__(
\
+  process_independent, no_address_discrimination, 
no_extra_discrimination)]]
 #  endif
 #else
 #  define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH
diff --git a/libcxx/src/include/overridable_function.h 
b/libcxx/src/include/overridable_function.h
index e71e4f104b290..c7639f56eee26 100644
--- a/libcxx/src/include/overridable_function.h
+++ b/libcxx/src/include/overridable_function.h
@@ -13,7 +13,7 @@
 #include <__config>
 #include 
 
-#if defined(__arm64e__) && __has_feature(ptrauth_calls)
+#if __has_feature(ptrauth_calls)
 #  include 
 #endif
 
@@ -83,13 +83,13 @@ _LIBCPP_HIDE_FROM_ABI bool __is_function_overridden(_Ret 
(*__fptr)(_Args...)) no
   uintptr_t __end   = reinterpret_cast(&__lcxx_override_end);
   uintptr_t __ptr   = reinterpret_cast(__fptr);
 
-#if defined(__arm64e__) && __has_feature(ptrauth_calls)
+#  if __has_feature(ptrauth_calls)
   // We must pass a void* to ptrauth_strip since it only accepts a pointer 
type. Also, in particular,
   // we must NOT pass a function pointer, otherwise we will strip the function 
pointer, and then attempt
   // to authenticate and re-sign it when casting it to a uintptr_t again, 
which will fail because we just
   // stripped the function pointer. See rdar://122927845.
   __ptr = 
reinterpret_cast(ptrauth_strip(reinterpret_cast(__ptr), 
ptrauth_key_function_pointer));
-#endif
+#  endif
 
   // Finally, the function was overridden if it falls outside of the section's 
bounds.
   return __ptr < __start || __ptr > __end;
diff --git a/libcxxabi/src/private_typeinfo.cpp 
b/libcxxabi/src/private_typeinfo.cpp
index 9e58501a55934..9dba91e1985e3 100644
--- a/libcxxabi/src/private_typeinfo.cpp
+++ b/libcxxabi/src/private_typeinfo.cpp
@@ -55,15 +55,12 @@
 #include 
 #endif
 
-
-template
-static inline
-T *
-get_vtable(T *vtable) {
+template 
+static inline T* strip_vtable(T* vtable) {
 #if __has_feature(ptrauth_calls)
-vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer);
+  vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer);
 #endif
-return vtable;
+  return vtable;
 }
 
 static inline
@@ -117,11 +114,10 @@ void dyn_cast_get_derived_info(derived_object_info* info, 
const void* static_ptr
 reinterpret_cast(vtable) + offset_to_ti_proxy;
 info->dynamic_type = *(reinterpret_cast(ptr_to_ti_proxy));
 #else
-void **vtable = *static_cast(static_ptr);
-vtable = get_vtable(vtable);
-info->offset_to_derived = reinterpret_cast(vtable[-2]);
-info->dynamic_ptr = static_cast(static_ptr) + 
info->offset_to_derived;
-info->dynamic_type = static_cast(vtable[-1]);
+  void** vtable = strip_vtable(*static_cast(static_ptr));
+  info->offset_to_derived = reinterpret_cast(vtable[-2]);
+  info->dynamic_ptr = static_cast(static_ptr) + 
info->offset_to_derived;
+  info->dynamic_type = static_cast(vtable[-1]);
 #endif
 }
 
@@ -576,8 +572,7 @@ 
__base

[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100183
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@ahmedbougacha What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100183
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libcxxabi] release/19.x: [libc++][libc++abi] Minor follow-up changes after ptrauth upstreaming (#87481) (PR #100183)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxxabi

Author: None (llvmbot)


Changes

Backport e64e745e8fb8

Requested by: @ldionne

---
Full diff: https://github.com/llvm/llvm-project/pull/100183.diff


3 Files Affected:

- (modified) libcxx/include/typeinfo (+5-4) 
- (modified) libcxx/src/include/overridable_function.h (+3-3) 
- (modified) libcxxabi/src/private_typeinfo.cpp (+13-20) 


``diff
diff --git a/libcxx/include/typeinfo b/libcxx/include/typeinfo
index d1c0de3c1bfdd..2727cad02fa99 100644
--- a/libcxx/include/typeinfo
+++ b/libcxx/include/typeinfo
@@ -275,13 +275,14 @@ struct __type_info_implementations {
   __impl;
 };
 
-#if defined(__arm64__) && 
__has_cpp_attribute(clang::ptrauth_vtable_pointer)
-#  if __has_feature(ptrauth_type_info_discriminated_vtable_pointer)
+#if __has_cpp_attribute(_Clang::__ptrauth_vtable_pointer__)
+#  if __has_feature(ptrauth_type_info_vtable_pointer_discrimination)
 #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH  
\
-  [[clang::ptrauth_vtable_pointer(process_independent, 
address_discrimination, type_discrimination)]]
+  [[_Clang::__ptrauth_vtable_pointer__(process_independent, 
address_discrimination, type_discrimination)]]
 #  else
 #define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH  
\
-  [[clang::ptrauth_vtable_pointer(process_independent, 
no_address_discrimination, no_extra_discrimination)]]
+  [[_Clang::__ptrauth_vtable_pointer__(
\
+  process_independent, no_address_discrimination, 
no_extra_discrimination)]]
 #  endif
 #else
 #  define _LIBCPP_TYPE_INFO_VTABLE_POINTER_AUTH
diff --git a/libcxx/src/include/overridable_function.h 
b/libcxx/src/include/overridable_function.h
index e71e4f104b290..c7639f56eee26 100644
--- a/libcxx/src/include/overridable_function.h
+++ b/libcxx/src/include/overridable_function.h
@@ -13,7 +13,7 @@
 #include <__config>
 #include 
 
-#if defined(__arm64e__) && __has_feature(ptrauth_calls)
+#if __has_feature(ptrauth_calls)
 #  include 
 #endif
 
@@ -83,13 +83,13 @@ _LIBCPP_HIDE_FROM_ABI bool __is_function_overridden(_Ret 
(*__fptr)(_Args...)) no
   uintptr_t __end   = reinterpret_cast(&__lcxx_override_end);
   uintptr_t __ptr   = reinterpret_cast(__fptr);
 
-#if defined(__arm64e__) && __has_feature(ptrauth_calls)
+#  if __has_feature(ptrauth_calls)
   // We must pass a void* to ptrauth_strip since it only accepts a pointer 
type. Also, in particular,
   // we must NOT pass a function pointer, otherwise we will strip the function 
pointer, and then attempt
   // to authenticate and re-sign it when casting it to a uintptr_t again, 
which will fail because we just
   // stripped the function pointer. See rdar://122927845.
   __ptr = 
reinterpret_cast(ptrauth_strip(reinterpret_cast(__ptr), 
ptrauth_key_function_pointer));
-#endif
+#  endif
 
   // Finally, the function was overridden if it falls outside of the section's 
bounds.
   return __ptr < __start || __ptr > __end;
diff --git a/libcxxabi/src/private_typeinfo.cpp 
b/libcxxabi/src/private_typeinfo.cpp
index 9e58501a55934..9dba91e1985e3 100644
--- a/libcxxabi/src/private_typeinfo.cpp
+++ b/libcxxabi/src/private_typeinfo.cpp
@@ -55,15 +55,12 @@
 #include 
 #endif
 
-
-template
-static inline
-T *
-get_vtable(T *vtable) {
+template 
+static inline T* strip_vtable(T* vtable) {
 #if __has_feature(ptrauth_calls)
-vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer);
+  vtable = ptrauth_strip(vtable, ptrauth_key_cxx_vtable_pointer);
 #endif
-return vtable;
+  return vtable;
 }
 
 static inline
@@ -117,11 +114,10 @@ void dyn_cast_get_derived_info(derived_object_info* info, 
const void* static_ptr
 reinterpret_cast(vtable) + offset_to_ti_proxy;
 info->dynamic_type = *(reinterpret_cast(ptr_to_ti_proxy));
 #else
-void **vtable = *static_cast(static_ptr);
-vtable = get_vtable(vtable);
-info->offset_to_derived = reinterpret_cast(vtable[-2]);
-info->dynamic_ptr = static_cast(static_ptr) + 
info->offset_to_derived;
-info->dynamic_type = static_cast(vtable[-1]);
+  void** vtable = strip_vtable(*static_cast(static_ptr));
+  info->offset_to_derived = reinterpret_cast(vtable[-2]);
+  info->dynamic_ptr = static_cast(static_ptr) + 
info->offset_to_derived;
+  info->dynamic_type = static_cast(vtable[-1]);
 #endif
 }
 
@@ -576,8 +572,7 @@ 
__base_class_type_info::has_unambiguous_public_base(__dynamic_cast_info* info,
find the layout.  */
 offset_to_base = __offset_flags >> __offset_shift;
 if (is_virtual) {
-  const char* vtable = *static_cast(adjustedPtr);
-  vtable = get_vtable(vtable);
+  const char* vtable = strip_vtable(*static_cast(adjustedPtr));
   offset_to_base = update_offset_to_base(vtable, offset_to_base);
 }
   }

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add omp.target_triples attribute to the OffloadModuleInterface (PR #100154)

2024-07-23 Thread Pranav Bhandarkar via llvm-branch-commits

https://github.com/bhandarkar-pranav approved this pull request.


https://github.com/llvm/llvm-project/pull/100154
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [Flang][OpenMP] Add frontend support for -fopenmp-targets (PR #100155)

2024-07-23 Thread Pranav Bhandarkar via llvm-branch-commits

https://github.com/bhandarkar-pranav approved this pull request.


https://github.com/llvm/llvm-project/pull/100155
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Shaw Young via llvm-branch-commits


@@ -555,6 +574,10 @@ size_t matchWeightsByHashes(
ProbeMap.lower_bound(FuncAddr + BlockRange.second));
   for (const auto &[_, Probes] : BlockProbes) {
 for (const MCDecodedPseudoProbe &Probe : Probes) {
+  if (Probe.getInlineTreeNode()->hasInlineSite())

shawbyoung wrote:

This pruning resulted in a tangible increase in 1:1 mappings btw profile and 
binary pseudo probes - the PseudoProbeDecoder::ProbeMap interface not only 
contains the probe attached to some block b, but contains the block probes 
attached to block b inlined in other functions.

https://github.com/llvm/llvm-project/pull/99891
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add support for -mcpu=pwr11 / -mtune=pwr11 (#99511) (PR #100151)

2024-07-23 Thread via llvm-branch-commits

https://github.com/azhan92 approved this pull request.

lgtm

https://github.com/llvm/llvm-project/pull/100151
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100191

Backport 92a9d4831d5e40c286247c30fcd794563adbef6e

Requested by: @ian-twilightcoder

>From e3ec8d577ee97f496f7a27fc6099ca5ded220d3b Mon Sep 17 00:00:00 2001
From: Ian Anderson 
Date: Tue, 23 Jul 2024 13:02:59 -0700
Subject: [PATCH] [clang][headers] Including stddef.h always redefines NULL
 (#99727)

stddef.h always includes __stddef_null.h. This is fine in modules
because it's not possible to re-include the pcm, and it's necessary to
export the _Builtin_stddef.null submodule. However, without modules it
causes NULL to always get redefined which disrupts some C++ code. Rework
the inclusion of __stddef_null.h so that with not building with modules
it's only included if __need_NULL is set by the includer, or it's the
first time stddef.h is being included.

(cherry picked from commit 92a9d4831d5e40c286247c30fcd794563adbef6e)
---
 clang/lib/Headers/stdarg.h |  4 +-
 clang/lib/Headers/stddef.h | 21 -
 clang/test/Headers/stddefneeds.cpp | 15 --
 clang/test/Modules/stddef.cpp  | 73 ++
 4 files changed, 105 insertions(+), 8 deletions(-)
 create mode 100644 clang/test/Modules/stddef.cpp

diff --git a/clang/lib/Headers/stdarg.h b/clang/lib/Headers/stdarg.h
index 8292ab907becf..6203d7a600a23 100644
--- a/clang/lib/Headers/stdarg.h
+++ b/clang/lib/Headers/stdarg.h
@@ -20,19 +20,18 @@
  * modules.
  */
 #if defined(__MVS__) && __has_include_next()
-#include <__stdarg_header_macro.h>
 #undef __need___va_list
 #undef __need_va_list
 #undef __need_va_arg
 #undef __need___va_copy
 #undef __need_va_copy
+#include <__stdarg_header_macro.h>
 #include_next 
 
 #else
 #if !defined(__need___va_list) && !defined(__need_va_list) &&  
\
 !defined(__need_va_arg) && !defined(__need___va_copy) &&   
\
 !defined(__need_va_copy)
-#include <__stdarg_header_macro.h>
 #define __need___va_list
 #define __need_va_list
 #define __need_va_arg
@@ -45,6 +44,7 @@
 !defined(__STRICT_ANSI__)
 #define __need_va_copy
 #endif
+#include <__stdarg_header_macro.h>
 #endif
 
 #ifdef __need___va_list
diff --git a/clang/lib/Headers/stddef.h b/clang/lib/Headers/stddef.h
index 8985c526e8fc5..99b275aebf5aa 100644
--- a/clang/lib/Headers/stddef.h
+++ b/clang/lib/Headers/stddef.h
@@ -20,7 +20,6 @@
  * modules.
  */
 #if defined(__MVS__) && __has_include_next()
-#include <__stddef_header_macro.h>
 #undef __need_ptrdiff_t
 #undef __need_size_t
 #undef __need_rsize_t
@@ -31,6 +30,7 @@
 #undef __need_max_align_t
 #undef __need_offsetof
 #undef __need_wint_t
+#include <__stddef_header_macro.h>
 #include_next 
 
 #else
@@ -40,7 +40,6 @@
 !defined(__need_NULL) && !defined(__need_nullptr_t) && 
\
 !defined(__need_unreachable) && !defined(__need_max_align_t) &&
\
 !defined(__need_offsetof) && !defined(__need_wint_t)
-#include <__stddef_header_macro.h>
 #define __need_ptrdiff_t
 #define __need_size_t
 /* ISO9899:2011 7.20 (C11 Annex K): Define rsize_t if __STDC_WANT_LIB_EXT1__ is
@@ -49,7 +48,24 @@
 #define __need_rsize_t
 #endif
 #define __need_wchar_t
+#if !defined(__STDDEF_H) || __has_feature(modules)
+/*
+ * __stddef_null.h is special when building without modules: if __need_NULL is
+ * set, then it will unconditionally redefine NULL. To avoid stepping on client
+ * definitions of NULL, __need_NULL should only be set the first time this
+ * header is included, that is when __STDDEF_H is not defined. However, when
+ * building with modules, this header is a textual header and needs to
+ * unconditionally include __stdef_null.h to support multiple submodules
+ * exporting _Builtin_stddef.null. Take module SM with submodules A and B, 
whose
+ * headers both include stddef.h When SM.A builds, __STDDEF_H will be defined.
+ * When SM.B builds, the definition from SM.A will leak when building without
+ * local submodule visibility. stddef.h wouldn't include __stddef_null.h, and
+ * SM.B wouldn't import _Builtin_stddef.null, and SM.B's `export *` wouldn't
+ * export NULL as expected. When building with modules, always include
+ * __stddef_null.h so that everything works as expected.
+ */
 #define __need_NULL
+#endif
 #if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) ||  
\
 defined(__cplusplus)
 #define __need_nullptr_t
@@ -65,6 +81,7 @@
 /* wint_t is provided by  and not . It's here
  * for compatibility, but must be explicitly requested. Therefore
  * __need_wint_t is intentionally not defined here. */
+#include <__stddef_header_macro.h>
 #endif
 
 #if defined(__need_ptrdiff_t)
diff --git a/clang/test/Headers/stddefneeds.cpp 
b/clang/test/Headers/stddefneeds.cpp
index 0763bbdee13ae..0282e8afa600d 100644
--- a/clang/test/Headers/stddefneeds.cpp
+++ b/clang/test/Headers/stddefneeds.cpp
@@ -56,14 +56,21 @@ max_align_t m5;
 #undef NULL
 #define NULL 0
 
-// glibc (and other) headers then define __need_NULL and rely on s

[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100191
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@AaronBallman What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100191
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][headers] Including stddef.h always redefines NULL (#99727) (PR #100191)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang-modules

Author: None (llvmbot)


Changes

Backport 92a9d4831d5e40c286247c30fcd794563adbef6e

Requested by: @ian-twilightcoder

---
Full diff: https://github.com/llvm/llvm-project/pull/100191.diff


4 Files Affected:

- (modified) clang/lib/Headers/stdarg.h (+2-2) 
- (modified) clang/lib/Headers/stddef.h (+19-2) 
- (modified) clang/test/Headers/stddefneeds.cpp (+11-4) 
- (added) clang/test/Modules/stddef.cpp (+73) 


``diff
diff --git a/clang/lib/Headers/stdarg.h b/clang/lib/Headers/stdarg.h
index 8292ab907becf..6203d7a600a23 100644
--- a/clang/lib/Headers/stdarg.h
+++ b/clang/lib/Headers/stdarg.h
@@ -20,19 +20,18 @@
  * modules.
  */
 #if defined(__MVS__) && __has_include_next()
-#include <__stdarg_header_macro.h>
 #undef __need___va_list
 #undef __need_va_list
 #undef __need_va_arg
 #undef __need___va_copy
 #undef __need_va_copy
+#include <__stdarg_header_macro.h>
 #include_next 
 
 #else
 #if !defined(__need___va_list) && !defined(__need_va_list) &&  
\
 !defined(__need_va_arg) && !defined(__need___va_copy) &&   
\
 !defined(__need_va_copy)
-#include <__stdarg_header_macro.h>
 #define __need___va_list
 #define __need_va_list
 #define __need_va_arg
@@ -45,6 +44,7 @@
 !defined(__STRICT_ANSI__)
 #define __need_va_copy
 #endif
+#include <__stdarg_header_macro.h>
 #endif
 
 #ifdef __need___va_list
diff --git a/clang/lib/Headers/stddef.h b/clang/lib/Headers/stddef.h
index 8985c526e8fc5..99b275aebf5aa 100644
--- a/clang/lib/Headers/stddef.h
+++ b/clang/lib/Headers/stddef.h
@@ -20,7 +20,6 @@
  * modules.
  */
 #if defined(__MVS__) && __has_include_next()
-#include <__stddef_header_macro.h>
 #undef __need_ptrdiff_t
 #undef __need_size_t
 #undef __need_rsize_t
@@ -31,6 +30,7 @@
 #undef __need_max_align_t
 #undef __need_offsetof
 #undef __need_wint_t
+#include <__stddef_header_macro.h>
 #include_next 
 
 #else
@@ -40,7 +40,6 @@
 !defined(__need_NULL) && !defined(__need_nullptr_t) && 
\
 !defined(__need_unreachable) && !defined(__need_max_align_t) &&
\
 !defined(__need_offsetof) && !defined(__need_wint_t)
-#include <__stddef_header_macro.h>
 #define __need_ptrdiff_t
 #define __need_size_t
 /* ISO9899:2011 7.20 (C11 Annex K): Define rsize_t if __STDC_WANT_LIB_EXT1__ is
@@ -49,7 +48,24 @@
 #define __need_rsize_t
 #endif
 #define __need_wchar_t
+#if !defined(__STDDEF_H) || __has_feature(modules)
+/*
+ * __stddef_null.h is special when building without modules: if __need_NULL is
+ * set, then it will unconditionally redefine NULL. To avoid stepping on client
+ * definitions of NULL, __need_NULL should only be set the first time this
+ * header is included, that is when __STDDEF_H is not defined. However, when
+ * building with modules, this header is a textual header and needs to
+ * unconditionally include __stdef_null.h to support multiple submodules
+ * exporting _Builtin_stddef.null. Take module SM with submodules A and B, 
whose
+ * headers both include stddef.h When SM.A builds, __STDDEF_H will be defined.
+ * When SM.B builds, the definition from SM.A will leak when building without
+ * local submodule visibility. stddef.h wouldn't include __stddef_null.h, and
+ * SM.B wouldn't import _Builtin_stddef.null, and SM.B's `export *` wouldn't
+ * export NULL as expected. When building with modules, always include
+ * __stddef_null.h so that everything works as expected.
+ */
 #define __need_NULL
+#endif
 #if (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 202311L) ||  
\
 defined(__cplusplus)
 #define __need_nullptr_t
@@ -65,6 +81,7 @@
 /* wint_t is provided by  and not . It's here
  * for compatibility, but must be explicitly requested. Therefore
  * __need_wint_t is intentionally not defined here. */
+#include <__stddef_header_macro.h>
 #endif
 
 #if defined(__need_ptrdiff_t)
diff --git a/clang/test/Headers/stddefneeds.cpp 
b/clang/test/Headers/stddefneeds.cpp
index 0763bbdee13ae..0282e8afa600d 100644
--- a/clang/test/Headers/stddefneeds.cpp
+++ b/clang/test/Headers/stddefneeds.cpp
@@ -56,14 +56,21 @@ max_align_t m5;
 #undef NULL
 #define NULL 0
 
-// glibc (and other) headers then define __need_NULL and rely on stddef.h
-// to redefine NULL to the correct value again.
-#define __need_NULL
+// Including stddef.h again shouldn't redefine NULL
 #include 
 
 // gtk headers then use __attribute__((sentinel)), which doesn't work if NULL
 // is 0.
-void f(const char* c, ...) __attribute__((sentinel));
+void f(const char* c, ...) __attribute__((sentinel)); // 
expected-note{{function has been explicitly marked sentinel here}}
 void g() {
+  f("", NULL); // expected-warning{{missing sentinel in function call}}
+}
+
+// glibc (and other) headers then define __need_NULL and rely on stddef.h
+// to redefine NULL to the correct value again.
+#define __need_NULL
+#include 
+
+void h() {
   f("", NULL);  // Shouldn't warn.
 }
diff --git a/

[llvm-branch-commits] [libc] 4c07e7f - Revert "[libc][RISCV] Add naked attribute to setjmp/longjmp (#100036)"

2024-07-23 Thread via llvm-branch-commits

Author: Paul Kirth
Date: 2024-07-23T13:15:47-07:00
New Revision: 4c07e7f659ab91c22c1b0440080902d0b931195d

URL: 
https://github.com/llvm/llvm-project/commit/4c07e7f659ab91c22c1b0440080902d0b931195d
DIFF: 
https://github.com/llvm/llvm-project/commit/4c07e7f659ab91c22c1b0440080902d0b931195d.diff

LOG: Revert "[libc][RISCV] Add naked attribute to setjmp/longjmp (#100036)"

This reverts commit 05b586be3d70cd51c809c52a67d36517fb4b8f6f.

Added: 


Modified: 
libc/src/setjmp/riscv/longjmp.cpp
libc/src/setjmp/riscv/setjmp.cpp

Removed: 




diff  --git a/libc/src/setjmp/riscv/longjmp.cpp 
b/libc/src/setjmp/riscv/longjmp.cpp
index b14f636659ac3..0f9537ccc4151 100644
--- a/libc/src/setjmp/riscv/longjmp.cpp
+++ b/libc/src/setjmp/riscv/longjmp.cpp
@@ -30,7 +30,6 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-[[gnu::naked]]
 LLVM_LIBC_FUNCTION(void, longjmp, (__jmp_buf * buf, int val)) {
   LOAD(ra, buf->__pc);
   LOAD(s0, buf->__regs[0]);

diff  --git a/libc/src/setjmp/riscv/setjmp.cpp 
b/libc/src/setjmp/riscv/setjmp.cpp
index 92982cc9d74d4..12def578b56f3 100644
--- a/libc/src/setjmp/riscv/setjmp.cpp
+++ b/libc/src/setjmp/riscv/setjmp.cpp
@@ -29,7 +29,6 @@
 
 namespace LIBC_NAMESPACE_DECL {
 
-[[gnu::naked]]
 LLVM_LIBC_FUNCTION(int, setjmp, (__jmp_buf * buf)) {
   STORE(ra, buf->__pc);
   STORE(s0, buf->__regs[0]);



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100195

Backport ca076f7a63f6a80e2e38315ec462be354b196b8d

Requested by: @MaskRay

>From 772a44ca77676be636cd7027c8703e8467bc38ad Mon Sep 17 00:00:00 2001
From: Wesley Wiser 
Date: Tue, 23 Jul 2024 11:43:30 -0500
Subject: [PATCH] [LLVM] [MC] Update frame layout & CFI generation to handle
 frames larger than 2gb (#99263)

Rebase of #84114. I've only included the core changes to frame layout
calculation & CFI generation which sidesteps the regressions found after
merging #84114. Since these changes are a necessary precursor to the
overall fix and are themselves slightly beneficial as CFI is now
generated correctly, I think it is reasonable to merge this first step.

---

For very large stack frames, the offset from the stack pointer to a
local can be more than 2^31 which overflows various `int` offsets in the
frame lowering code.

This patch updates the frame lowering code to calculate the offsets as
64-bit values and fixes CFI to use the corrected sizes.

After this patch, additional work is needed to fix offset truncations in
each target's codegen.

(cherry picked from commit ca076f7a63f6a80e2e38315ec462be354b196b8d)
---
 llvm/include/llvm/CodeGen/MachineFrameInfo.h  | 14 +++---
 .../llvm/CodeGen/TargetFrameLowering.h|  4 +-
 llvm/include/llvm/MC/MCAsmBackend.h   |  2 +-
 llvm/include/llvm/MC/MCDwarf.h| 44 +--
 llvm/lib/CodeGen/CFIInstrInserter.cpp | 10 ++---
 llvm/lib/CodeGen/MachineFrameInfo.cpp |  2 +-
 llvm/lib/CodeGen/PrologEpilogInserter.cpp |  4 +-
 llvm/lib/MC/MCDwarf.cpp   |  6 +--
 .../MCTargetDesc/AArch64AsmBackend.cpp|  8 ++--
 llvm/lib/Target/ARM/ARMFrameLowering.cpp  |  4 +-
 .../Target/ARM/MCTargetDesc/ARMAsmBackend.cpp |  2 +-
 .../ARM/MCTargetDesc/ARMAsmBackendDarwin.h|  2 +-
 .../Target/Hexagon/HexagonFrameLowering.cpp   |  4 +-
 .../lib/Target/MSP430/MSP430FrameLowering.cpp |  2 +-
 .../Target/X86/MCTargetDesc/X86AsmBackend.cpp | 12 ++---
 llvm/lib/Target/X86/X86FrameLowering.cpp  |  4 +-
 llvm/test/CodeGen/PowerPC/huge-frame-size.ll  |  2 +-
 llvm/test/CodeGen/RISCV/pr88365.ll|  2 +-
 llvm/test/CodeGen/X86/huge-stack.ll   |  2 +-
 19 files changed, 65 insertions(+), 65 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 466fed7fb3a29..213b7ec6b3fbf 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -251,7 +251,7 @@ class MachineFrameInfo {
   /// targets, this value is only used when generating debug info (via
   /// TargetRegisterInfo::getFrameIndexReference); when generating code, the
   /// corresponding adjustments are performed directly.
-  int OffsetAdjustment = 0;
+  int64_t OffsetAdjustment = 0;
 
   /// The prolog/epilog code inserter may process objects that require greater
   /// alignment than the default alignment the target provides.
@@ -280,7 +280,7 @@ class MachineFrameInfo {
   /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo
   /// class).  This information is important for frame pointer elimination.
   /// It is only valid during and after prolog/epilog code insertion.
-  unsigned MaxCallFrameSize = ~0u;
+  uint64_t MaxCallFrameSize = ~UINT64_C(0);
 
   /// The number of bytes of callee saved registers that the target wants to
   /// report for the current function in the CodeView S_FRAMEPROC record.
@@ -593,10 +593,10 @@ class MachineFrameInfo {
   uint64_t estimateStackSize(const MachineFunction &MF) const;
 
   /// Return the correction for frame offsets.
-  int getOffsetAdjustment() const { return OffsetAdjustment; }
+  int64_t getOffsetAdjustment() const { return OffsetAdjustment; }
 
   /// Set the correction for frame offsets.
-  void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }
+  void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; }
 
   /// Return the alignment in bytes that this function must be aligned to,
   /// which is greater than the default stack alignment provided by the target.
@@ -663,7 +663,7 @@ class MachineFrameInfo {
   /// CallFrameSetup/Destroy pseudo instructions are used by the target, and
   /// then only during or after prolog/epilog code insertion.
   ///
-  unsigned getMaxCallFrameSize() const {
+  uint64_t getMaxCallFrameSize() const {
 // TODO: Enable this assert when targets are fixed.
 //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed 
yet");
 if (!isMaxCallFrameSizeComputed())
@@ -671,9 +671,9 @@ class MachineFrameInfo {
 return MaxCallFrameSize;
   }
   bool isMaxCallFrameSizeComputed() const {
-return MaxCallFrameSize != ~0u;
+return MaxCallFrameSize != ~UINT64_C(0);
   }
-  void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
+  void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; }

[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100195
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@wesleywiser What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100195
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-msp430
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-debuginfo

Author: None (llvmbot)


Changes

Backport ca076f7a63f6a80e2e38315ec462be354b196b8d

Requested by: @MaskRay

---

Patch is 27.11 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/100195.diff


19 Files Affected:

- (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7-7) 
- (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+2-2) 
- (modified) llvm/include/llvm/MC/MCAsmBackend.h (+1-1) 
- (modified) llvm/include/llvm/MC/MCDwarf.h (+22-22) 
- (modified) llvm/lib/CodeGen/CFIInstrInserter.cpp (+5-5) 
- (modified) llvm/lib/CodeGen/MachineFrameInfo.cpp (+1-1) 
- (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+2-2) 
- (modified) llvm/lib/MC/MCDwarf.cpp (+3-3) 
- (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp (+4-4) 
- (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+2-2) 
- (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+1-1) 
- (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h (+1-1) 
- (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+2-2) 
- (modified) llvm/lib/Target/MSP430/MSP430FrameLowering.cpp (+1-1) 
- (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+6-6) 
- (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+2-2) 
- (modified) llvm/test/CodeGen/PowerPC/huge-frame-size.ll (+1-1) 
- (modified) llvm/test/CodeGen/RISCV/pr88365.ll (+1-1) 
- (modified) llvm/test/CodeGen/X86/huge-stack.ll (+1-1) 


``diff
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 466fed7fb3a29..213b7ec6b3fbf 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -251,7 +251,7 @@ class MachineFrameInfo {
   /// targets, this value is only used when generating debug info (via
   /// TargetRegisterInfo::getFrameIndexReference); when generating code, the
   /// corresponding adjustments are performed directly.
-  int OffsetAdjustment = 0;
+  int64_t OffsetAdjustment = 0;
 
   /// The prolog/epilog code inserter may process objects that require greater
   /// alignment than the default alignment the target provides.
@@ -280,7 +280,7 @@ class MachineFrameInfo {
   /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo
   /// class).  This information is important for frame pointer elimination.
   /// It is only valid during and after prolog/epilog code insertion.
-  unsigned MaxCallFrameSize = ~0u;
+  uint64_t MaxCallFrameSize = ~UINT64_C(0);
 
   /// The number of bytes of callee saved registers that the target wants to
   /// report for the current function in the CodeView S_FRAMEPROC record.
@@ -593,10 +593,10 @@ class MachineFrameInfo {
   uint64_t estimateStackSize(const MachineFunction &MF) const;
 
   /// Return the correction for frame offsets.
-  int getOffsetAdjustment() const { return OffsetAdjustment; }
+  int64_t getOffsetAdjustment() const { return OffsetAdjustment; }
 
   /// Set the correction for frame offsets.
-  void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }
+  void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; }
 
   /// Return the alignment in bytes that this function must be aligned to,
   /// which is greater than the default stack alignment provided by the target.
@@ -663,7 +663,7 @@ class MachineFrameInfo {
   /// CallFrameSetup/Destroy pseudo instructions are used by the target, and
   /// then only during or after prolog/epilog code insertion.
   ///
-  unsigned getMaxCallFrameSize() const {
+  uint64_t getMaxCallFrameSize() const {
 // TODO: Enable this assert when targets are fixed.
 //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed 
yet");
 if (!isMaxCallFrameSizeComputed())
@@ -671,9 +671,9 @@ class MachineFrameInfo {
 return MaxCallFrameSize;
   }
   bool isMaxCallFrameSizeComputed() const {
-return MaxCallFrameSize != ~0u;
+return MaxCallFrameSize != ~UINT64_C(0);
   }
-  void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
+  void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; }
 
   /// Returns how many bytes of callee-saved registers the target pushed in the
   /// prologue. Only used for debug info.
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h 
b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0b9cacecc7cbe..72978b2f746d7 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -51,7 +51,7 @@ class TargetFrameLowering {
   // Maps a callee saved register to a stack slot with a fixed offset.
   struct SpillSlot {
 unsigned Reg;
-int Offset; // Offset relative to stack pointer on function entry.
+int64_t Offset; // Offset relative to stack pointer on function entry.
   };
 
   struct DwarfFrameBase {
@@ -66,7 +

[llvm-branch-commits] [llvm] release/19.x: [LLVM] [MC] Update frame layout & CFI generation to handle frames larger than 2gb (#99263) (PR #100195)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-hexagon

Author: None (llvmbot)


Changes

Backport ca076f7a63f6a80e2e38315ec462be354b196b8d

Requested by: @MaskRay

---

Patch is 27.11 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/100195.diff


19 Files Affected:

- (modified) llvm/include/llvm/CodeGen/MachineFrameInfo.h (+7-7) 
- (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+2-2) 
- (modified) llvm/include/llvm/MC/MCAsmBackend.h (+1-1) 
- (modified) llvm/include/llvm/MC/MCDwarf.h (+22-22) 
- (modified) llvm/lib/CodeGen/CFIInstrInserter.cpp (+5-5) 
- (modified) llvm/lib/CodeGen/MachineFrameInfo.cpp (+1-1) 
- (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+2-2) 
- (modified) llvm/lib/MC/MCDwarf.cpp (+3-3) 
- (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64AsmBackend.cpp (+4-4) 
- (modified) llvm/lib/Target/ARM/ARMFrameLowering.cpp (+2-2) 
- (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+1-1) 
- (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackendDarwin.h (+1-1) 
- (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+2-2) 
- (modified) llvm/lib/Target/MSP430/MSP430FrameLowering.cpp (+1-1) 
- (modified) llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp (+6-6) 
- (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+2-2) 
- (modified) llvm/test/CodeGen/PowerPC/huge-frame-size.ll (+1-1) 
- (modified) llvm/test/CodeGen/RISCV/pr88365.ll (+1-1) 
- (modified) llvm/test/CodeGen/X86/huge-stack.ll (+1-1) 


``diff
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h 
b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 466fed7fb3a29..213b7ec6b3fbf 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -251,7 +251,7 @@ class MachineFrameInfo {
   /// targets, this value is only used when generating debug info (via
   /// TargetRegisterInfo::getFrameIndexReference); when generating code, the
   /// corresponding adjustments are performed directly.
-  int OffsetAdjustment = 0;
+  int64_t OffsetAdjustment = 0;
 
   /// The prolog/epilog code inserter may process objects that require greater
   /// alignment than the default alignment the target provides.
@@ -280,7 +280,7 @@ class MachineFrameInfo {
   /// setup/destroy pseudo instructions (as defined in the TargetFrameInfo
   /// class).  This information is important for frame pointer elimination.
   /// It is only valid during and after prolog/epilog code insertion.
-  unsigned MaxCallFrameSize = ~0u;
+  uint64_t MaxCallFrameSize = ~UINT64_C(0);
 
   /// The number of bytes of callee saved registers that the target wants to
   /// report for the current function in the CodeView S_FRAMEPROC record.
@@ -593,10 +593,10 @@ class MachineFrameInfo {
   uint64_t estimateStackSize(const MachineFunction &MF) const;
 
   /// Return the correction for frame offsets.
-  int getOffsetAdjustment() const { return OffsetAdjustment; }
+  int64_t getOffsetAdjustment() const { return OffsetAdjustment; }
 
   /// Set the correction for frame offsets.
-  void setOffsetAdjustment(int Adj) { OffsetAdjustment = Adj; }
+  void setOffsetAdjustment(int64_t Adj) { OffsetAdjustment = Adj; }
 
   /// Return the alignment in bytes that this function must be aligned to,
   /// which is greater than the default stack alignment provided by the target.
@@ -663,7 +663,7 @@ class MachineFrameInfo {
   /// CallFrameSetup/Destroy pseudo instructions are used by the target, and
   /// then only during or after prolog/epilog code insertion.
   ///
-  unsigned getMaxCallFrameSize() const {
+  uint64_t getMaxCallFrameSize() const {
 // TODO: Enable this assert when targets are fixed.
 //assert(isMaxCallFrameSizeComputed() && "MaxCallFrameSize not computed 
yet");
 if (!isMaxCallFrameSizeComputed())
@@ -671,9 +671,9 @@ class MachineFrameInfo {
 return MaxCallFrameSize;
   }
   bool isMaxCallFrameSizeComputed() const {
-return MaxCallFrameSize != ~0u;
+return MaxCallFrameSize != ~UINT64_C(0);
   }
-  void setMaxCallFrameSize(unsigned S) { MaxCallFrameSize = S; }
+  void setMaxCallFrameSize(uint64_t S) { MaxCallFrameSize = S; }
 
   /// Returns how many bytes of callee-saved registers the target pushed in the
   /// prologue. Only used for debug info.
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h 
b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0b9cacecc7cbe..72978b2f746d7 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -51,7 +51,7 @@ class TargetFrameLowering {
   // Maps a callee saved register to a stack slot with a fixed offset.
   struct SpillSlot {
 unsigned Reg;
-int Offset; // Offset relative to stack pointer on function entry.
+int64_t Offset; // Offset relative to stack pointer on function entry.
   };
 
   struct DwarfFrameBase {
@@ -66,7 +66,7 @@ class TargetFrameLowering {
   // Used with FrameBa

[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100207

Backport 63b382bbde5994e8f2cec75883320e3ad9fd618f

Requested by: @azhan92

>From 33bfe961c6c6bd8601a68d0d6b58cfce2310518c Mon Sep 17 00:00:00 2001
From: azhan92 
Date: Tue, 23 Jul 2024 09:51:13 -0400
Subject: [PATCH] [PowerPC] Add builtin_cpu_is P11 support (#99550)

This PR adds support for __builtin_cpu_is ("power11")

(cherry picked from commit 63b382bbde5994e8f2cec75883320e3ad9fd618f)
---
 clang/test/CodeGen/aix-builtin-cpu-is.c   |  4 ++
 clang/test/CodeGen/builtin-cpu-supports.c | 72 ---
 .../llvm/TargetParser/PPCTargetParser.def |  3 +
 3 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/clang/test/CodeGen/aix-builtin-cpu-is.c 
b/clang/test/CodeGen/aix-builtin-cpu-is.c
index e17cf7353511a..04644dd7020e0 100644
--- a/clang/test/CodeGen/aix-builtin-cpu-is.c
+++ b/clang/test/CodeGen/aix-builtin-cpu-is.c
@@ -50,6 +50,10 @@
 // RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | 
FileCheck %s -DVALUE=262144 \
 // RUN:   --check-prefix=CHECKOP
 
+// RUN: echo "int main() { return __builtin_cpu_is(\"power11\");}" > %t.c
+// RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | 
FileCheck %s -DVALUE=524288 \
+// RUN:   --check-prefix=CHECKOP
+
 // CHECK: define i32 @main() #0 {
 // CHECK-NEXT: entry:
 // CHECK-NEXT:   %retval = alloca i32, align 4
diff --git a/clang/test/CodeGen/builtin-cpu-supports.c 
b/clang/test/CodeGen/builtin-cpu-supports.c
index 88eb7b0fa786e..f960040ab094b 100644
--- a/clang/test/CodeGen/builtin-cpu-supports.c
+++ b/clang/test/CodeGen/builtin-cpu-supports.c
@@ -129,25 +129,69 @@ int v4() { return __builtin_cpu_supports("x86-64-v4"); }
 // CHECK-PPC:   if.else3:
 // CHECK-PPC-NEXT:[[CPU_IS:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3)
 // CHECK-PPC-NEXT:[[TMP6:%.*]] = icmp eq i32 [[CPU_IS]], 39
-// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label 
[[IF_END:%.*]]
+// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label 
[[IF_ELSE5:%.*]]
 // CHECK-PPC:   if.then4:
 // CHECK-PPC-NEXT:[[TMP7:%.*]] = load i32, ptr [[A_ADDR]], align 4
 // CHECK-PPC-NEXT:[[TMP8:%.*]] = load i32, ptr [[A_ADDR]], align 4
 // CHECK-PPC-NEXT:[[ADD:%.*]] = add nsw i32 [[TMP7]], [[TMP8]]
 // CHECK-PPC-NEXT:store i32 [[ADD]], ptr [[RETVAL]], align 4
 // CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else5:
+// CHECK-PPC-NEXT:[[CPU_IS6:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3)
+// CHECK-PPC-NEXT:[[TMP9:%.*]] = icmp eq i32 [[CPU_IS6]], 45
+// CHECK-PPC-NEXT:br i1 [[TMP9]], label [[IF_THEN7:%.*]], label 
[[IF_ELSE9:%.*]]
+// CHECK-PPC:   if.then7:
+// CHECK-PPC-NEXT:[[TMP10:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[ADD8:%.*]] = add nsw i32 [[TMP10]], 3
+// CHECK-PPC-NEXT:store i32 [[ADD8]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else9:
+// CHECK-PPC-NEXT:[[CPU_IS10:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP11:%.*]] = icmp eq i32 [[CPU_IS10]], 46
+// CHECK-PPC-NEXT:br i1 [[TMP11]], label [[IF_THEN11:%.*]], label 
[[IF_ELSE13:%.*]]
+// CHECK-PPC:   if.then11:
+// CHECK-PPC-NEXT:[[TMP12:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[SUB12:%.*]] = sub nsw i32 [[TMP12]], 3
+// CHECK-PPC-NEXT:store i32 [[SUB12]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else13:
+// CHECK-PPC-NEXT:[[CPU_IS14:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP13:%.*]] = icmp eq i32 [[CPU_IS14]], 47
+// CHECK-PPC-NEXT:br i1 [[TMP13]], label [[IF_THEN15:%.*]], label 
[[IF_ELSE17:%.*]]
+// CHECK-PPC:   if.then15:
+// CHECK-PPC-NEXT:[[TMP14:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[ADD16:%.*]] = add nsw i32 [[TMP14]], 7
+// CHECK-PPC-NEXT:store i32 [[ADD16]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else17:
+// CHECK-PPC-NEXT:[[CPU_IS18:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP15:%.*]] = icmp eq i32 [[CPU_IS18]], 48
+// CHECK-PPC-NEXT:br i1 [[TMP15]], label [[IF_THEN19:%.*]], label 
[[IF_END:%.*]]
+// CHECK-PPC:   if.then19:
+// CHECK-PPC-NEXT:[[TMP16:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[SUB20:%.*]] = sub nsw i32 [[TMP16]], 7
+// CHECK-PPC-NEXT:store i32 [[SUB20]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
 // CHECK-PPC:   if.end:
-// CHECK-PPC-NEXT:br label [[IF_END5:%.*]]
-// CHECK-PPC:   if.end5:
-// CHECK-PPC-NEXT:br label [[IF_END6:%.*]]
-// CHECK-PPC:   if.end6:
-// CHECK-PPC-NEXT:[[TMP9:%.*]] = load i32, ptr [[A_ADDR]], align 4
-// CHECK-PPC-NEXT:[[ADD7:%.*]] = add nsw i32 [[TMP9]], 5
-// CHECK-PPC-NEXT:store i32 

[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100207
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@daltenty What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100207
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 63b382bbde5994e8f2cec75883320e3ad9fd618f

Requested by: @azhan92

---
Full diff: https://github.com/llvm/llvm-project/pull/100207.diff


3 Files Affected:

- (modified) clang/test/CodeGen/aix-builtin-cpu-is.c (+4) 
- (modified) clang/test/CodeGen/builtin-cpu-supports.c (+62-10) 
- (modified) llvm/include/llvm/TargetParser/PPCTargetParser.def (+3) 


``diff
diff --git a/clang/test/CodeGen/aix-builtin-cpu-is.c 
b/clang/test/CodeGen/aix-builtin-cpu-is.c
index e17cf7353511a..04644dd7020e0 100644
--- a/clang/test/CodeGen/aix-builtin-cpu-is.c
+++ b/clang/test/CodeGen/aix-builtin-cpu-is.c
@@ -50,6 +50,10 @@
 // RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | 
FileCheck %s -DVALUE=262144 \
 // RUN:   --check-prefix=CHECKOP
 
+// RUN: echo "int main() { return __builtin_cpu_is(\"power11\");}" > %t.c
+// RUN: %clang_cc1 -triple powerpc-ibm-aix7.2.0.0 -emit-llvm -o - %t.c | 
FileCheck %s -DVALUE=524288 \
+// RUN:   --check-prefix=CHECKOP
+
 // CHECK: define i32 @main() #0 {
 // CHECK-NEXT: entry:
 // CHECK-NEXT:   %retval = alloca i32, align 4
diff --git a/clang/test/CodeGen/builtin-cpu-supports.c 
b/clang/test/CodeGen/builtin-cpu-supports.c
index 88eb7b0fa786e..f960040ab094b 100644
--- a/clang/test/CodeGen/builtin-cpu-supports.c
+++ b/clang/test/CodeGen/builtin-cpu-supports.c
@@ -129,25 +129,69 @@ int v4() { return __builtin_cpu_supports("x86-64-v4"); }
 // CHECK-PPC:   if.else3:
 // CHECK-PPC-NEXT:[[CPU_IS:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3)
 // CHECK-PPC-NEXT:[[TMP6:%.*]] = icmp eq i32 [[CPU_IS]], 39
-// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label 
[[IF_END:%.*]]
+// CHECK-PPC-NEXT:br i1 [[TMP6]], label [[IF_THEN4:%.*]], label 
[[IF_ELSE5:%.*]]
 // CHECK-PPC:   if.then4:
 // CHECK-PPC-NEXT:[[TMP7:%.*]] = load i32, ptr [[A_ADDR]], align 4
 // CHECK-PPC-NEXT:[[TMP8:%.*]] = load i32, ptr [[A_ADDR]], align 4
 // CHECK-PPC-NEXT:[[ADD:%.*]] = add nsw i32 [[TMP7]], [[TMP8]]
 // CHECK-PPC-NEXT:store i32 [[ADD]], ptr [[RETVAL]], align 4
 // CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else5:
+// CHECK-PPC-NEXT:[[CPU_IS6:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 3)
+// CHECK-PPC-NEXT:[[TMP9:%.*]] = icmp eq i32 [[CPU_IS6]], 45
+// CHECK-PPC-NEXT:br i1 [[TMP9]], label [[IF_THEN7:%.*]], label 
[[IF_ELSE9:%.*]]
+// CHECK-PPC:   if.then7:
+// CHECK-PPC-NEXT:[[TMP10:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[ADD8:%.*]] = add nsw i32 [[TMP10]], 3
+// CHECK-PPC-NEXT:store i32 [[ADD8]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else9:
+// CHECK-PPC-NEXT:[[CPU_IS10:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP11:%.*]] = icmp eq i32 [[CPU_IS10]], 46
+// CHECK-PPC-NEXT:br i1 [[TMP11]], label [[IF_THEN11:%.*]], label 
[[IF_ELSE13:%.*]]
+// CHECK-PPC:   if.then11:
+// CHECK-PPC-NEXT:[[TMP12:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[SUB12:%.*]] = sub nsw i32 [[TMP12]], 3
+// CHECK-PPC-NEXT:store i32 [[SUB12]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else13:
+// CHECK-PPC-NEXT:[[CPU_IS14:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP13:%.*]] = icmp eq i32 [[CPU_IS14]], 47
+// CHECK-PPC-NEXT:br i1 [[TMP13]], label [[IF_THEN15:%.*]], label 
[[IF_ELSE17:%.*]]
+// CHECK-PPC:   if.then15:
+// CHECK-PPC-NEXT:[[TMP14:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[ADD16:%.*]] = add nsw i32 [[TMP14]], 7
+// CHECK-PPC-NEXT:store i32 [[ADD16]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
+// CHECK-PPC:   if.else17:
+// CHECK-PPC-NEXT:[[CPU_IS18:%.*]] = call i32 @llvm.ppc.fixed.addr.ld(i32 
3)
+// CHECK-PPC-NEXT:[[TMP15:%.*]] = icmp eq i32 [[CPU_IS18]], 48
+// CHECK-PPC-NEXT:br i1 [[TMP15]], label [[IF_THEN19:%.*]], label 
[[IF_END:%.*]]
+// CHECK-PPC:   if.then19:
+// CHECK-PPC-NEXT:[[TMP16:%.*]] = load i32, ptr [[A_ADDR]], align 4
+// CHECK-PPC-NEXT:[[SUB20:%.*]] = sub nsw i32 [[TMP16]], 7
+// CHECK-PPC-NEXT:store i32 [[SUB20]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[RETURN]]
 // CHECK-PPC:   if.end:
-// CHECK-PPC-NEXT:br label [[IF_END5:%.*]]
-// CHECK-PPC:   if.end5:
-// CHECK-PPC-NEXT:br label [[IF_END6:%.*]]
-// CHECK-PPC:   if.end6:
-// CHECK-PPC-NEXT:[[TMP9:%.*]] = load i32, ptr [[A_ADDR]], align 4
-// CHECK-PPC-NEXT:[[ADD7:%.*]] = add nsw i32 [[TMP9]], 5
-// CHECK-PPC-NEXT:store i32 [[ADD7]], ptr [[RETVAL]], align 4
+// CHECK-PPC-NEXT:br label [[IF_END21:%.*]]
+// CHECK-PPC:   if.end21:
+// CHECK-PPC-NEXT:br label [[IF_END22:%.*]]
+// CHECK-PPC:   if.end22:
+// CHECK-PPC-NEXT:br label [[IF_END23:%.*]]
+// CHECK-PPC:  

[llvm-branch-commits] [clang] [llvm] release/19.x: [PowerPC] Add builtin_cpu_is P11 support (#99550) (PR #100207)

2024-07-23 Thread via llvm-branch-commits

https://github.com/azhan92 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/100207
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-23 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/99891

>From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 22 Jul 2024 11:56:23 -0700
Subject: [PATCH 1/4] Changed assignment of profiles with pseudo probe index

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++
 .../X86/match-blocks-with-pseudo-probes.test  | 25 ++
 2 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 4105f626fb5b6..c135ee5ff4837 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -195,11 +195,15 @@ class StaleMatcher {
   void init(const std::vector &Blocks,
 const std::vector &Hashes,
 const std::vector &CallHashes,
-std::optional YamlBFGUID) {
+const std::unordered_map>
+IndexToBinaryPseudoProbes,
+const std::unordered_map
+BinaryPseudoProbeToBlock,
+const uint64_t YamlBFGUID) {
 assert(Blocks.size() == Hashes.size() &&
Hashes.size() == CallHashes.size() &&
"incorrect matcher initialization");
-
 for (size_t I = 0; I < Blocks.size(); I++) {
   FlowBlock *Block = Blocks[I];
   uint16_t OpHash = Hashes[I].OpcodeHash;
@@ -209,6 +213,8 @@ class StaleMatcher {
 std::make_pair(Hashes[I], Block));
   this->Blocks.push_back(Block);
 }
+this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes;
+this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock;
 this->YamlBFGUID = YamlBFGUID;
   }
 
@@ -234,10 +240,14 @@ class StaleMatcher {
   using HashBlockPairType = std::pair;
   std::unordered_map> OpHashToBlocks;
   std::unordered_map> 
CallHashToBlocks;
-  std::vector Blocks;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  std::vector Blocks;
   // If the pseudo probe checksums of the profiled and binary functions are
   // equal, then the YamlBF's GUID is defined and used to match blocks.
-  std::optional YamlBFGUID;
+  uint64_t YamlBFGUID;
 
   // Uses OpcodeHash to find the most similar block for a given hash.
   const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const {
@@ -284,7 +294,7 @@ class StaleMatcher {
 // Searches for the pseudo probe attached to the matched function's block,
 // ignoring pseudo probes attached to function calls and inlined functions'
 // blocks.
-outs() << "match with pseudo probes\n";
+std::vector BlockPseudoProbes;
 for (const auto &PseudoProbe : PseudoProbes) {
   // Ensures that pseudo probe information belongs to the appropriate
   // function and not an inlined function.
@@ -293,11 +303,30 @@ class StaleMatcher {
   // Skips pseudo probes attached to function calls.
   if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
 continue;
-  assert(PseudoProbe.Index < Blocks.size() &&
- "pseudo probe index out of range");
-  return Blocks[PseudoProbe.Index];
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
 }
-return nullptr;
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index < Blocks.size() && "Invalid pseudo probe index");
+
+auto It = IndexToBinaryPseudoProbes.find(Index);
+assert(It != IndexToBinaryPseudoProbes.end() &&
+   "All blocks should have a pseudo probe");
+if (It->second.size() > 1)
+  return nullptr;
+
+const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0];
+auto BinaryPseudoProbeIt = 
BinaryPseudoProbeToBlock.find(BinaryPseudoProbe);
+assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() &&
+   "All binary pseudo probes should belong a binary basic block");
+
+return BinaryPseudoProbeIt->second;
   }
 };
 
@@ -491,6 +520,11 @@ size_t matchWeightsByHashes(
   std::vector CallHashes;
   std::vector Blocks;
   std::vector BlendedHashes;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder();
   for (uint64_t I = 0; I < BlockOrder.size(); I++) {
 const BinaryBasicBlock *BB = BlockOrder[I];
 assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock");
@@ -510,9 +544,27 @@ size_t matchWeightsByHashes(
 Blocks.push_back(&Func.Blocks[I + 1]);
 BlendedBlockHash BlendedHash(BB->getHash());
 BlendedHashes.push_back(BlendedHash);
+if (PseudoProbeDecoder) {
+  const AddressProbesMap &ProbeMap =
+  PseudoProbeDecoder->getAd

[llvm-branch-commits] [BOLT] Support more than two jump table parents (PR #99988)

2024-07-23 Thread Davide Italiano via llvm-branch-commits

https://github.com/dcci approved this pull request.


https://github.com/llvm/llvm-project/pull/99988
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100215

Backport 8be1325cb1903797ba3dce67087e395f9e080576

Requested by: @asl

>From c8d9662b0542cc99a88acc35762dca7f0d09a22b Mon Sep 17 00:00:00 2001
From: Oliver Hunt 
Date: Tue, 23 Jul 2024 14:18:53 -0700
Subject: [PATCH] [clang][test] Add function type discrimination tests to
 static destructor tests (#99604)

I accidentally did not include tests for the setting up runtime calls when 
compiling with -fptrauth-function-pointer-type-discrimination

(cherry picked from commit 8be1325cb1903797ba3dce67087e395f9e080576)
---
 .../CodeGenCXX/ptrauth-static-destructors.cpp | 37 ---
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp 
b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
index 1240f26d329da..634450bf62ea9 100644
--- a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
+++ b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
@@ -2,13 +2,27 @@
 // RUN:  | FileCheck %s --check-prefix=CXAATEXIT
 
 // RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
-// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,DARWIN
+// RUN:-fno-use-cxa-atexit | FileCheck %s 
--check-prefixes=ATEXIT,ATEXIT_DARWIN
 
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
 // RUN:  | FileCheck %s --check-prefix=CXAATEXIT
 
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
-// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ELF
+// RUN:-fno-use-cxa-atexit | FileCheck %s 
--check-prefixes=ATEXIT,ATEXIT_ELF
+
+// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s \
+// RUN:  -fptrauth-function-pointer-type-discrimination  -o - | FileCheck %s 
--check-prefix=CXAATEXIT_DISC
+
+// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
+// RUN:   -fptrauth-function-pointer-type-discrimination  -fno-use-cxa-atexit \
+// RUN:  | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_DARWIN
+
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s \
+// RUN:  -fptrauth-function-pointer-type-discrimination  -o - | FileCheck %s 
--check-prefix=CXAATEXIT_DISC
+
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
+// RUN:   -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \
+// RUN:  | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_ELF
 
 class Foo {
  public:
@@ -21,11 +35,22 @@ Foo global;
 // CXAATEXIT: define internal void @__cxx_global_var_init()
 // CXAATEXIT:   call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0), 
ptr @global, ptr @__dso_handle)
 
+// CXAATEXIT_DISC: define internal void @__cxx_global_var_init()
+// CXAATEXIT_DISC:   call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 
0, i64 10942), ptr @global, ptr @__dso_handle)
 
 // ATEXIT: define internal void @__cxx_global_var_init()
 // ATEXIT:   %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 
0))
 
-// DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
-// ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
-// DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
-// ELF:  call void @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
+// ATEXIT_ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
+// ATEXIT_DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_ELF:  call void @_ZN3FooD1Ev(ptr @global)
+
+// ATEXIT_DISC: define internal void @__cxx_global_var_init()
+// ATEXIT_DISC:   %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, 
i32 0, i64 10942))
+
+
+// ATEXIT_DISC_DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
+// ATEXIT_DISC_ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
+// ATEXIT_DISC_DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_DISC_ELF:  call void @_ZN3FooD1Ev(ptr @global)

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@kovdan01 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][test] Add function type discrimination tests to static destructor tests (#99604) (PR #100215)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 8be1325cb1903797ba3dce67087e395f9e080576

Requested by: @asl

---
Full diff: https://github.com/llvm/llvm-project/pull/100215.diff


1 Files Affected:

- (modified) clang/test/CodeGenCXX/ptrauth-static-destructors.cpp (+31-6) 


``diff
diff --git a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp 
b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
index 1240f26d329da..634450bf62ea9 100644
--- a/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
+++ b/clang/test/CodeGenCXX/ptrauth-static-destructors.cpp
@@ -2,13 +2,27 @@
 // RUN:  | FileCheck %s --check-prefix=CXAATEXIT
 
 // RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
-// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,DARWIN
+// RUN:-fno-use-cxa-atexit | FileCheck %s 
--check-prefixes=ATEXIT,ATEXIT_DARWIN
 
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
 // RUN:  | FileCheck %s --check-prefix=CXAATEXIT
 
 // RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
-// RUN:-fno-use-cxa-atexit | FileCheck %s --check-prefixes=ATEXIT,ELF
+// RUN:-fno-use-cxa-atexit | FileCheck %s 
--check-prefixes=ATEXIT,ATEXIT_ELF
+
+// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s \
+// RUN:  -fptrauth-function-pointer-type-discrimination  -o - | FileCheck %s 
--check-prefix=CXAATEXIT_DISC
+
+// RUN: %clang_cc1 -triple arm64-apple-ios -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
+// RUN:   -fptrauth-function-pointer-type-discrimination  -fno-use-cxa-atexit \
+// RUN:  | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_DARWIN
+
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s \
+// RUN:  -fptrauth-function-pointer-type-discrimination  -o - | FileCheck %s 
--check-prefix=CXAATEXIT_DISC
+
+// RUN: %clang_cc1 -triple aarch64-linux-gnu -fptrauth-calls -emit-llvm 
-std=c++11 %s -o - \
+// RUN:   -fptrauth-function-pointer-type-discrimination -fno-use-cxa-atexit \
+// RUN:  | FileCheck %s --check-prefixes=ATEXIT_DISC,ATEXIT_DISC_ELF
 
 class Foo {
  public:
@@ -21,11 +35,22 @@ Foo global;
 // CXAATEXIT: define internal void @__cxx_global_var_init()
 // CXAATEXIT:   call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 0), 
ptr @global, ptr @__dso_handle)
 
+// CXAATEXIT_DISC: define internal void @__cxx_global_var_init()
+// CXAATEXIT_DISC:   call i32 @__cxa_atexit(ptr ptrauth (ptr @_ZN3FooD1Ev, i32 
0, i64 10942), ptr @global, ptr @__dso_handle)
 
 // ATEXIT: define internal void @__cxx_global_var_init()
 // ATEXIT:   %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, i32 
0))
 
-// DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
-// ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
-// DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
-// ELF:  call void @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
+// ATEXIT_ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
+// ATEXIT_DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_ELF:  call void @_ZN3FooD1Ev(ptr @global)
+
+// ATEXIT_DISC: define internal void @__cxx_global_var_init()
+// ATEXIT_DISC:   %{{.*}} = call i32 @atexit(ptr ptrauth (ptr @__dtor_global, 
i32 0, i64 10942))
+
+
+// ATEXIT_DISC_DARWIN: define internal void @__dtor_global() {{.*}} section 
"__TEXT,__StaticInit,regular,pure_instructions" {
+// ATEXIT_DISC_ELF:define internal void @__dtor_global() {{.*}} section 
".text.startup" {
+// ATEXIT_DISC_DARWIN:   %{{.*}} = call ptr @_ZN3FooD1Ev(ptr @global)
+// ATEXIT_DISC_ELF:  call void @_ZN3FooD1Ev(ptr @global)

``




https://github.com/llvm/llvm-project/pull/100215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/100216

Backport 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43

Requested by: @jhuber6

>From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 23 Jul 2024 14:41:57 -0500
Subject: [PATCH] [Clang] Correctly forward `--cuda-path` to the nvlink wrapper
 (#100170)

Summary:
This was not forwarded properly as it would try to pass it to `nvlink`.

Fixes https://github.com/llvm/llvm-project/issues/100168

(cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43)
---
 clang/lib/Driver/ToolChains/Cuda.cpp   |  4 
 clang/test/Driver/linker-wrapper-passes.c  | 10 +++---
 clang/test/Driver/nvlink-wrapper.c |  7 +++
 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td |  4 ++--
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 59453c484ae4f..61d12b10dfb62 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 CmdArgs.push_back(Args.MakeArgString(
 "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));
 
+  if (Args.hasArg(options::OPT_cuda_path_EQ))
+CmdArgs.push_back(Args.MakeArgString(
+"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ)));
+
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
index aadcf472e9b63..8c337ff906d17 100644
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ b/clang/test/Driver/linker-wrapper-passes.c
@@ -1,9 +1,5 @@
 // Check various clang-linker-wrapper pass options after -offload-opt.
 
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-
 // Setup.
 // RUN: mkdir -p %t
 // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
@@ -23,14 +19,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
 
 // Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-p="function(goodbye),module(inline)" \
@@ -43,7 +39,7 @@
 // RUN: -check-prefixes=YML %s
 
 // Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
+// RUN: not clang-linker-wrapper --dry-run \
 // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
 
diff --git a/clang/test/Driver/nvlink-wrapper.c 
b/clang/test/Driver/nvlink-wrapper.c
index fdda93f1f9cdc..318315ddaca34 100644
--- a/clang/test/Driver/nvlink-wrapper.c
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -63,3 +63,10 @@ int baz() { return y + x; }
 // RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
 // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
 // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin 
{{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
+
+//
+// Check that we don't forward some arguments.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s 
--check-prefix=PATH
+// PATH-NOT: --cuda-path=/opt/cuda
diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td 
b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
index e84b530f2787d..8c80a51b12a44 100644
--- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
+++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
@@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose 
information">;
 def version : Flag<["--"], "version">,
   HelpText<"Display the version number and exit">;
 
-def cuda_path_EQ : Joined<["--"], "cuda-path=">,
+def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the system CUDA path">;
-def ptxas_path_EQ : Joined<["--"], "ptxas-path=">,
+def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the 'ptxas' path">;
 
 def o : JoinedOrSeparate<["-"],

[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:

@Artem-B What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang-driver

Author: None (llvmbot)


Changes

Backport 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43

Requested by: @jhuber6

---
Full diff: https://github.com/llvm/llvm-project/pull/100216.diff


4 Files Affected:

- (modified) clang/lib/Driver/ToolChains/Cuda.cpp (+4) 
- (modified) clang/test/Driver/linker-wrapper-passes.c (+3-7) 
- (modified) clang/test/Driver/nvlink-wrapper.c (+7) 
- (modified) clang/tools/clang-nvlink-wrapper/NVLinkOpts.td (+2-2) 


``diff
diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 59453c484ae4f..61d12b10dfb62 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 CmdArgs.push_back(Args.MakeArgString(
 "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));
 
+  if (Args.hasArg(options::OPT_cuda_path_EQ))
+CmdArgs.push_back(Args.MakeArgString(
+"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ)));
+
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
index aadcf472e9b63..8c337ff906d17 100644
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ b/clang/test/Driver/linker-wrapper-passes.c
@@ -1,9 +1,5 @@
 // Check various clang-linker-wrapper pass options after -offload-opt.
 
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-
 // Setup.
 // RUN: mkdir -p %t
 // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
@@ -23,14 +19,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
 
 // Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-p="function(goodbye),module(inline)" \
@@ -43,7 +39,7 @@
 // RUN: -check-prefixes=YML %s
 
 // Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
+// RUN: not clang-linker-wrapper --dry-run \
 // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
 
diff --git a/clang/test/Driver/nvlink-wrapper.c 
b/clang/test/Driver/nvlink-wrapper.c
index fdda93f1f9cdc..318315ddaca34 100644
--- a/clang/test/Driver/nvlink-wrapper.c
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -63,3 +63,10 @@ int baz() { return y + x; }
 // RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
 // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
 // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin 
{{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
+
+//
+// Check that we don't forward some arguments.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s 
--check-prefix=PATH
+// PATH-NOT: --cuda-path=/opt/cuda
diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td 
b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
index e84b530f2787d..8c80a51b12a44 100644
--- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
+++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
@@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose 
information">;
 def version : Flag<["--"], "version">,
   HelpText<"Display the version number and exit">;
 
-def cuda_path_EQ : Joined<["--"], "cuda-path=">,
+def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the system CUDA path">;
-def ptxas_path_EQ : Joined<["--"], "ptxas-path=">,
+def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the 'ptxas' path">;
 
 def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">,

``




https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits


@@ -1,9 +1,5 @@
 // Check various clang-linker-wrapper pass options after -offload-opt.
 

jhuber6 wrote:

```suggestion
// REQUIRES: llvm-plugins, llvm-examples
// REQUIRES: x86-registered-target
// REQUIRES: amdgpu-registered-target
```

https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 edited 
https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/100216

>From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 23 Jul 2024 14:41:57 -0500
Subject: [PATCH 1/2] [Clang] Correctly forward `--cuda-path` to the nvlink
 wrapper (#100170)

Summary:
This was not forwarded properly as it would try to pass it to `nvlink`.

Fixes https://github.com/llvm/llvm-project/issues/100168

(cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43)
---
 clang/lib/Driver/ToolChains/Cuda.cpp   |  4 
 clang/test/Driver/linker-wrapper-passes.c  | 10 +++---
 clang/test/Driver/nvlink-wrapper.c |  7 +++
 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td |  4 ++--
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 59453c484ae4f..61d12b10dfb62 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 CmdArgs.push_back(Args.MakeArgString(
 "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));
 
+  if (Args.hasArg(options::OPT_cuda_path_EQ))
+CmdArgs.push_back(Args.MakeArgString(
+"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ)));
+
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
index aadcf472e9b63..8c337ff906d17 100644
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ b/clang/test/Driver/linker-wrapper-passes.c
@@ -1,9 +1,5 @@
 // Check various clang-linker-wrapper pass options after -offload-opt.
 
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-
 // Setup.
 // RUN: mkdir -p %t
 // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
@@ -23,14 +19,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
 
 // Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-p="function(goodbye),module(inline)" \
@@ -43,7 +39,7 @@
 // RUN: -check-prefixes=YML %s
 
 // Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
+// RUN: not clang-linker-wrapper --dry-run \
 // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
 
diff --git a/clang/test/Driver/nvlink-wrapper.c 
b/clang/test/Driver/nvlink-wrapper.c
index fdda93f1f9cdc..318315ddaca34 100644
--- a/clang/test/Driver/nvlink-wrapper.c
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -63,3 +63,10 @@ int baz() { return y + x; }
 // RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
 // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
 // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin 
{{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
+
+//
+// Check that we don't forward some arguments.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s 
--check-prefix=PATH
+// PATH-NOT: --cuda-path=/opt/cuda
diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td 
b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
index e84b530f2787d..8c80a51b12a44 100644
--- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
+++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
@@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose 
information">;
 def version : Flag<["--"], "version">,
   HelpText<"Display the version number and exit">;
 
-def cuda_path_EQ : Joined<["--"], "cuda-path=">,
+def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the system CUDA path">;
-def ptxas_path_EQ : Joined<["--"], "ptxas-path=">,
+def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the 'ptxas' path">;
 
 def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">,

>From e9ac0f0e5916236cb091179cfa7befd081b01355

[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits


@@ -23,14 +22,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
 
 // Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \

jhuber6 wrote:

```suggestion
// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
```

https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits


@@ -23,14 +22,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \

jhuber6 wrote:

```suggestion
// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
```

https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits


@@ -43,7 +42,7 @@
 // RUN: -check-prefixes=YML %s
 
 // Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
+// RUN: not clang-linker-wrapper --dry-run \

jhuber6 wrote:

```suggestion
// RUN: not clang-linker-wrapper \
```

https://github.com/llvm/llvm-project/pull/100216
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [Clang] Correctly forward `--cuda-path` to the nvlink wrapper (#100170) (PR #100216)

2024-07-23 Thread Joseph Huber via llvm-branch-commits

https://github.com/jhuber6 updated 
https://github.com/llvm/llvm-project/pull/100216

>From d7f99606094fc1feb41b50de0b0eb6d07460 Mon Sep 17 00:00:00 2001
From: Joseph Huber 
Date: Tue, 23 Jul 2024 14:41:57 -0500
Subject: [PATCH 1/3] [Clang] Correctly forward `--cuda-path` to the nvlink
 wrapper (#100170)

Summary:
This was not forwarded properly as it would try to pass it to `nvlink`.

Fixes https://github.com/llvm/llvm-project/issues/100168

(cherry picked from commit 7e1fcf5dd657d465c3fc846f56c6f9d3a4560b43)
---
 clang/lib/Driver/ToolChains/Cuda.cpp   |  4 
 clang/test/Driver/linker-wrapper-passes.c  | 10 +++---
 clang/test/Driver/nvlink-wrapper.c |  7 +++
 clang/tools/clang-nvlink-wrapper/NVLinkOpts.td |  4 ++--
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Cuda.cpp 
b/clang/lib/Driver/ToolChains/Cuda.cpp
index 59453c484ae4f..61d12b10dfb62 100644
--- a/clang/lib/Driver/ToolChains/Cuda.cpp
+++ b/clang/lib/Driver/ToolChains/Cuda.cpp
@@ -609,6 +609,10 @@ void NVPTX::Linker::ConstructJob(Compilation &C, const 
JobAction &JA,
 CmdArgs.push_back(Args.MakeArgString(
 "--pxtas-path=" + Args.getLastArgValue(options::OPT_ptxas_path_EQ)));
 
+  if (Args.hasArg(options::OPT_cuda_path_EQ))
+CmdArgs.push_back(Args.MakeArgString(
+"--cuda-path=" + Args.getLastArgValue(options::OPT_cuda_path_EQ)));
+
   // Add paths specified in LIBRARY_PATH environment variable as -L options.
   addDirectoryList(Args, CmdArgs, "-L", "LIBRARY_PATH");
 
diff --git a/clang/test/Driver/linker-wrapper-passes.c 
b/clang/test/Driver/linker-wrapper-passes.c
index aadcf472e9b63..8c337ff906d17 100644
--- a/clang/test/Driver/linker-wrapper-passes.c
+++ b/clang/test/Driver/linker-wrapper-passes.c
@@ -1,9 +1,5 @@
 // Check various clang-linker-wrapper pass options after -offload-opt.
 
-// REQUIRES: llvm-plugins, llvm-examples
-// REQUIRES: x86-registered-target
-// REQUIRES: amdgpu-registered-target
-
 // Setup.
 // RUN: mkdir -p %t
 // RUN: %clang -cc1 -emit-llvm-bc -o %t/host-x86_64-unknown-linux-gnu.bc \
@@ -23,14 +19,14 @@
 // RUN: %t/host-x86_64-unknown-linux-gnu.s
 
 // Check plugin, -passes, and no remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-passes="function(goodbye),module(inline)" 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=OUT %s
 
 // Check plugin, -p, and remarks.
-// RUN: clang-linker-wrapper -o a.out --embed-bitcode \
+// RUN: clang-linker-wrapper -o a.out --embed-bitcode --dry-run \
 // RUN: --linker-path=/usr/bin/true %t/host-x86_64-unknown-linux-gnu.o \
 // RUN: %offload-opt-loadbye --offload-opt=-wave-goodbye \
 // RUN: --offload-opt=-p="function(goodbye),module(inline)" \
@@ -43,7 +39,7 @@
 // RUN: -check-prefixes=YML %s
 
 // Check handling of bad plugin.
-// RUN: not clang-linker-wrapper \
+// RUN: not clang-linker-wrapper --dry-run \
 // RUN: --offload-opt=-load-pass-plugin=%t/nonexistent.so 2>&1 | \
 // RUN:   FileCheck -match-full-lines -check-prefixes=BAD-PLUGIN %s
 
diff --git a/clang/test/Driver/nvlink-wrapper.c 
b/clang/test/Driver/nvlink-wrapper.c
index fdda93f1f9cdc..318315ddaca34 100644
--- a/clang/test/Driver/nvlink-wrapper.c
+++ b/clang/test/Driver/nvlink-wrapper.c
@@ -63,3 +63,10 @@ int baz() { return y + x; }
 // RUN:   -arch sm_52 -o a.out 2>&1 | FileCheck %s --check-prefix=LTO
 // LTO: ptxas{{.*}} -m64 -c [[PTX:.+]].s -O3 -arch sm_52 -o [[CUBIN:.+]].cubin
 // LTO: nvlink{{.*}} -arch sm_52 -o a.out [[CUBIN]].cubin 
{{.*}}-u-{{.*}}.cubin {{.*}}-y-{{.*}}.cubin
+
+//
+// Check that we don't forward some arguments.
+//
+// RUN: clang-nvlink-wrapper --dry-run %t.o %t-u.o %t-y.a \
+// RUN:   -arch sm_52 --cuda-path/opt/cuda -o a.out 2>&1 | FileCheck %s 
--check-prefix=PATH
+// PATH-NOT: --cuda-path=/opt/cuda
diff --git a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td 
b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
index e84b530f2787d..8c80a51b12a44 100644
--- a/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
+++ b/clang/tools/clang-nvlink-wrapper/NVLinkOpts.td
@@ -12,9 +12,9 @@ def verbose : Flag<["-"], "v">, HelpText<"Print verbose 
information">;
 def version : Flag<["--"], "version">,
   HelpText<"Display the version number and exit">;
 
-def cuda_path_EQ : Joined<["--"], "cuda-path=">,
+def cuda_path_EQ : Joined<["--"], "cuda-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the system CUDA path">;
-def ptxas_path_EQ : Joined<["--"], "ptxas-path=">,
+def ptxas_path_EQ : Joined<["--"], "ptxas-path=">, Flags<[WrapperOnlyOption]>,
   MetaVarName<"">, HelpText<"Set the 'ptxas' path">;
 
 def o : JoinedOrSeparate<["-"], "o">, MetaVarName<"">,

>From e9ac0f0e5916236cb091179cfa7befd081b01355

  1   2   >