[llvm-branch-commits] [llvm] ebf0fc9 - Revert "[AArch64] Lower scalable i1 vector add reduction to cntp (#99031)"

2024-07-22 Thread via llvm-branch-commits

Author: Max Beck-Jones
Date: 2024-07-22T10:24:54+01:00
New Revision: ebf0fc9ae845af15baed663d79a5e4e88542f1e4

URL: 
https://github.com/llvm/llvm-project/commit/ebf0fc9ae845af15baed663d79a5e4e88542f1e4
DIFF: 
https://github.com/llvm/llvm-project/commit/ebf0fc9ae845af15baed663d79a5e4e88542f1e4.diff

LOG: Revert "[AArch64] Lower scalable i1 vector add reduction to cntp (#99031)"

This reverts commit 4db11c1f6cd6cd12b51a3220a54697b90e2e8821.

Added: 


Modified: 
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Removed: 
llvm/test/CodeGen/AArch64/sve-i1-add-reduce.ll



diff  --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c11855da3fae0..bf205b1706a6c 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -27640,20 +27640,6 @@ SDValue 
AArch64TargetLowering::LowerReductionToSVE(unsigned Opcode,
 VecOp = convertToScalableVector(DAG, ContainerVT, VecOp);
   }
 
-  // Lower VECREDUCE_ADD of nxv2i1-nxv16i1 to CNTP rather than UADDV.
-  if (ScalarOp.getOpcode() == ISD::VECREDUCE_ADD &&
-  VecOp.getOpcode() == ISD::ZERO_EXTEND) {
-SDValue BoolVec = VecOp.getOperand(0);
-if (BoolVec.getValueType().getVectorElementType() == MVT::i1) {
-  // CNTP(BoolVec & BoolVec) <=> CNTP(BoolVec & PTRUE)
-  SDValue CntpOp = DAG.getNode(
-  ISD::INTRINSIC_WO_CHAIN, DL, MVT::i64,
-  DAG.getTargetConstant(Intrinsic::aarch64_sve_cntp, DL, MVT::i64),
-  BoolVec, BoolVec);
-  return DAG.getAnyExtOrTrunc(CntpOp, DL, ScalarOp.getValueType());
-}
-  }
-
   // UADDV always returns an i64 result.
   EVT ResVT = (Opcode == AArch64ISD::UADDV_PRED) ? MVT::i64 :

SrcVT.getVectorElementType();

diff  --git a/llvm/test/CodeGen/AArch64/sve-i1-add-reduce.ll 
b/llvm/test/CodeGen/AArch64/sve-i1-add-reduce.ll
deleted file mode 100644
index a748cf732e090..0
--- a/llvm/test/CodeGen/AArch64/sve-i1-add-reduce.ll
+++ /dev/null
@@ -1,132 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
-; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
-
-define i8 @uaddv_zexti8_nxv16i1( %v) {
-; CHECK-LABEL: uaddv_zexti8_nxv16i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.b
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i8 @llvm.vector.reduce.add.nxv16i8( %3)
-  ret i8 %4
-}
-
-define i8 @uaddv_zexti8_nxv8i1( %v) {
-; CHECK-LABEL: uaddv_zexti8_nxv8i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.h
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i8 @llvm.vector.reduce.add.nxv8i8( %3)
-  ret i8 %4
-}
-
-define i16 @uaddv_zexti16_nxv8i1( %v) {
-; CHECK-LABEL: uaddv_zexti16_nxv8i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.h
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i16 @llvm.vector.reduce.add.nxv8i16( %3)
-  ret i16 %4
-}
-
-define i8 @uaddv_zexti8_nxv4i1( %v) {
-; CHECK-LABEL: uaddv_zexti8_nxv4i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.s
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i8 @llvm.vector.reduce.add.nxv4i8( %3)
-  ret i8 %4
-}
-
-define i16 @uaddv_zexti16_nxv4i1( %v) {
-; CHECK-LABEL: uaddv_zexti16_nxv4i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.s
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i16 @llvm.vector.reduce.add.nxv4i16( %3)
-  ret i16 %4
-}
-
-define i32 @uaddv_zexti32_nxv4i1( %v) {
-; CHECK-LABEL: uaddv_zexti32_nxv4i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.s
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i32 @llvm.vector.reduce.add.nxv4i32( %3)
-  ret i32 %4
-}
-
-define i8 @uaddv_zexti8_nxv2i1( %v) {
-; CHECK-LABEL: uaddv_zexti8_nxv2i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.d
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i8 @llvm.vector.reduce.add.nxv2i8( %3)
-  ret i8 %4
-}
-
-define i16 @uaddv_zexti16_nxv2i1( %v) {
-; CHECK-LABEL: uaddv_zexti16_nxv2i1:
-; CHECK:   // %bb.0: // %entry
-; CHECK-NEXT:cntp x0, p0, p0.d
-; CHECK-NEXT:// kill: def $w0 killed $w0 killed $x0
-; CHECK-NEXT:ret
-entry:
-  %3 = zext  %v to 
-  %4 = tail call i16 @llvm.vector.reduce.add.nxv2i16( %3)
-  

[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)

2024-07-22 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah commented:

I think it would be best to add errors for these unsupported clauses to the 
OpenMPToLLVMIR translation, so that clauses in MLIR are not silently ignored.

https://github.com/llvm/llvm-project/pull/99507
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)

2024-07-22 Thread Tom Eccles via llvm-branch-commits


@@ -238,9 +237,9 @@ def SectionOp : OpenMP_Op<"section", 
[HasParent<"SectionsOp">],
 def SectionsOp : OpenMP_Op<"sections", traits = [
 AttrSizedOperandSegments
   ], clauses = [
-// TODO: Complete clause list (private).
 // TODO: Sort clauses alphabetically.

tblah wrote:

nit: might as well sort the clauses while you are changing the line

https://github.com/llvm/llvm-project/pull/99507
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)

2024-07-22 Thread Tom Eccles via llvm-branch-commits


@@ -432,9 +439,10 @@ def SimdOp : OpenMP_Op<"simd", traits = [
 AttrSizedOperandSegments, DeclareOpInterfaceMethods,
 RecursiveMemoryEffects, SingleBlock
   ], clauses = [
-// TODO: Complete clause list (linear, private, reduction).
+// TODO: Complete clause list (linear).

tblah wrote:

Why did you decide not to add linear while adding the other missing clauses?

https://github.com/llvm/llvm-project/pull/99507
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] [MLIR][OpenMP] Add missing clauses to OpenMP op definitions (PR #99507)

2024-07-22 Thread Tom Eccles via llvm-branch-commits

https://github.com/tblah edited https://github.com/llvm/llvm-project/pull/99507
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits


@@ -49,7 +49,7 @@ void test_s_wait_event_export_ready() {
 }
 
 // CHECK-LABEL: @test_global_add_f32
-// CHECK: {{.*}}call{{.*}} float 
@llvm.amdgcn.global.atomic.fadd.f32.p1.f32(ptr addrspace(1) %{{.*}}, float 
%{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) %addr, float %x syncscope("agent") 
seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+}}, 
!amdgpu.ignore.denormal.mode !{{[0-9]+$}}

arsenm wrote:

We have all the codegen tests in the backends. Clang tests checking ISA are 
discouraged. The set of atomic tests is hard enough to maintain as it is 

https://github.com/llvm/llvm-project/pull/96872
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 9fb427c - Revert "[Clang][NEON] Add neon target guard to intrinsics (#98624)"

2024-07-22 Thread via llvm-branch-commits

Author: Lukacma
Date: 2024-07-22T12:32:23+01:00
New Revision: 9fb427c61f9444e90bad89b14723f1c369f67aad

URL: 
https://github.com/llvm/llvm-project/commit/9fb427c61f9444e90bad89b14723f1c369f67aad
DIFF: 
https://github.com/llvm/llvm-project/commit/9fb427c61f9444e90bad89b14723f1c369f67aad.diff

LOG: Revert "[Clang][NEON] Add neon target guard to intrinsics (#98624)"

This reverts commit dc82c774a74ad7e94d4c555e4cae1025e30f3876.

Added: 


Modified: 
clang/include/clang/Basic/arm_neon.td
clang/include/clang/Basic/arm_neon_incl.td
clang/utils/TableGen/NeonEmitter.cpp

Removed: 
clang/test/Sema/aarch64-neon-without-target-feature.cpp



diff  --git a/clang/include/clang/Basic/arm_neon.td 
b/clang/include/clang/Basic/arm_neon.td
index 3098fa67e6a51..6390ba3f9fe5e 100644
--- a/clang/include/clang/Basic/arm_neon.td
+++ b/clang/include/clang/Basic/arm_neon.td
@@ -289,7 +289,7 @@ def SPLATQ : WInst<"splat_laneq", ".(!Q)I",

"UcUsUicsilPcPsfQUcQUsQUiQcQsQiQPcQPsQflUlQlQUlhdQhQdPlQPl"> {
   let isLaneQ = 1;
 }
-let TargetGuard = "bf16,neon" in {
+let TargetGuard = "bf16" in {
   def SPLAT_BF  : WInst<"splat_lane", ".(!q)I", "bQb">;
   def SPLATQ_BF : WInst<"splat_laneq", ".(!Q)I", "bQb"> {
 let isLaneQ = 1;
@@ -323,7 +323,7 @@ def VMLSL: SOpInst<"vmlsl", "(>Q)(>Q)..", "csiUcUsUi", 
OP_MLSL>;
 def VQDMULH  : SInst<"vqdmulh", "...", "siQsQi">;
 def VQRDMULH : SInst<"vqrdmulh", "...", "siQsQi">;
 
-let TargetGuard = "v8.1a,neon" in {
+let TargetGuard = "v8.1a" in {
 def VQRDMLAH : SInst<"vqrdmlah", "", "siQsQi">;
 def VQRDMLSH : SInst<"vqrdmlsh", "", "siQsQi">;
 }
@@ -614,7 +614,7 @@ def A64_VQDMULH_LANE  : SInst<"vqdmulh_lane", "..(!q)I", 
"siQsQi">;
 def A64_VQRDMULH_LANE : SInst<"vqrdmulh_lane", "..(!q)I", "siQsQi">;
 }
 
-let TargetGuard = "v8.1a,neon" in {
+let TargetGuard = "v8.1a" in {
 def VQRDMLAH_LANE : SOpInst<"vqrdmlah_lane", "...qI", "siQsQi", OP_QRDMLAH_LN>;
 def VQRDMLSH_LANE : SOpInst<"vqrdmlsh_lane", "...qI", "siQsQi", OP_QRDMLSH_LN>;
 }
@@ -957,7 +957,7 @@ def VQDMLAL_HIGH : SOpInst<"vqdmlal_high", "(>Q)(>Q)QQ", 
"si", OP_QDMLALHi>;
 def VQDMLAL_HIGH_N : SOpInst<"vqdmlal_high_n", "(>Q)(>Q)Q1", "si", 
OP_QDMLALHi_N>;
 def VQDMLSL_HIGH : SOpInst<"vqdmlsl_high", "(>Q)(>Q)QQ", "si", OP_QDMLSLHi>;
 def VQDMLSL_HIGH_N : SOpInst<"vqdmlsl_high_n", "(>Q)(>Q)Q1", "si", 
OP_QDMLSLHi_N>;
-let TargetGuard = "aes,neon" in {
+let TargetGuard = "aes" in {
   def VMULL_P64: SInst<"vmull", "(1>)11", "Pl">;
   def VMULL_HIGH_P64 : SOpInst<"vmull_high", "(1>)..", "HPl", OP_MULLHi_P64>;
 }
@@ -1091,7 +1091,7 @@ let isLaneQ = 1 in {
 def VQDMULH_LANEQ  : SInst<"vqdmulh_laneq", "..QI", "siQsQi">;
 def VQRDMULH_LANEQ : SInst<"vqrdmulh_laneq", "..QI", "siQsQi">;
 }
-let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"v8.1a,neon" in {
+let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"v8.1a" in {
 def VQRDMLAH_LANEQ : SOpInst<"vqrdmlah_laneq", "...QI", "siQsQi", 
OP_QRDMLAH_LN> {
   let isLaneQ = 1;
 }
@@ -1122,14 +1122,14 @@ def VEXT_A64 : WInst<"vext", "...I", "dQdPlQPl">;
 
 

 // Crypto
-let ArchGuard = "__ARM_ARCH >= 8", TargetGuard = "aes,neon" in {
+let ArchGuard = "__ARM_ARCH >= 8", TargetGuard = "aes" in {
 def AESE : SInst<"vaese", "...", "QUc">;
 def AESD : SInst<"vaesd", "...", "QUc">;
 def AESMC : SInst<"vaesmc", "..", "QUc">;
 def AESIMC : SInst<"vaesimc", "..", "QUc">;
 }
 
-let ArchGuard = "__ARM_ARCH >= 8", TargetGuard = "sha2,neon" in {
+let ArchGuard = "__ARM_ARCH >= 8", TargetGuard = "sha2" in {
 def SHA1H : SInst<"vsha1h", "11", "Ui">;
 def SHA1SU1 : SInst<"vsha1su1", "...", "QUi">;
 def SHA256SU0 : SInst<"vsha256su0", "...", "QUi">;
@@ -1143,7 +1143,7 @@ def SHA256H2 : SInst<"vsha256h2", "", "QUi">;
 def SHA256SU1 : SInst<"vsha256su1", "", "QUi">;
 }
 
-let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"sha3,neon" in {
+let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"sha3" in {
 def BCAX : SInst<"vbcax", "", "QUcQUsQUiQUlQcQsQiQl">;
 def EOR3 : SInst<"veor3", "", "QUcQUsQUiQUlQcQsQiQl">;
 def RAX1 : SInst<"vrax1", "...", "QUl">;
@@ -1153,14 +1153,14 @@ def XAR :  SInst<"vxar", "...I", "QUl">;
 }
 }
 
-let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"sha3,neon" in {
+let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"sha3" in {
 def SHA512SU0 : SInst<"vsha512su0", "...", "QUl">;
 def SHA512su1 : SInst<"vsha512su1", "", "QUl">;
 def SHA512H : SInst<"vsha512h", "", "QUl">;
 def SHA512H2 : SInst<"vsha512h2", "", "QUl">;
 }
 
-let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGuard = 
"sm4,neon" in {
+let ArchGuard = "defined(__aarch64__) || defined(__arm64ec__)", TargetGua

[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> > I still think it is terribly surprising all of the test diff shows up in 
> > this commit, and not the selection case
> 
> Because the selection support is done in the next PR of the review stack, 
> #96162. This patch takes care of choosing the right opcode while merging the 
> loads.

It's more the lack of vectorization of these loads in the IR that's surprising. 
Ideally we wouldn't have to rely on any of this post-isel load merging 

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits


@@ -6,7 +6,7 @@ declare i32 @llvm.amdgcn.global.atomic.csub(ptr addrspace(1), 
i32)
 
 ; GCN-LABEL: {{^}}global_atomic_csub_rtn:
 ; PREGFX12: global_atomic_csub v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9:]+}}, 
s{{\[[0-9]+:[0-9]+\]}} glc
-; GFX12PLUS: global_atomic_sub_clamp_u32 v0, v0, v1, s[0:1] th:TH_ATOMIC_RETURN
+; GFX12PLUS: global_atomic_sub_clamp_u32 v{{[0-9]+}}, v{{[0-9]+}}, 
v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}} th:TH_ATOMIC_RETURN

jayfoad wrote:

You shouldn't need any changes in this file.

https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SILoadStoreOptimizer] Merge constrained sloads (PR #96162)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.


https://github.com/llvm/llvm-project/pull/96162
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

ping 

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad edited 
https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits


@@ -34,18 +34,17 @@ entry:
 }
 
 define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16_dpp(
-; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
-; SDAG-GFX11:   ; %bb.0: ; %entry
-; SDAG-GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24
-; SDAG-GFX11-NEXT:s_waitcnt lgkmcnt(0)
-; SDAG-GFX11-NEXT:scratch_load_b32 v0, off, s2
-; SDAG-GFX11-NEXT:scratch_load_u16 v1, off, s3
-; SDAG-GFX11-NEXT:scratch_load_b32 v2, off, s1
-; SDAG-GFX11-NEXT:s_waitcnt vmcnt(0)
-; SDAG-GFX11-NEXT:v_dot2_bf16_bf16_e64_dpp v0, v2, v0, v1 
quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
-; SDAG-GFX11-NEXT:scratch_store_b16 off, v0, s0
-; SDAG-GFX11-NEXT:s_endpgm
-;
+; GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
+; GFX11:   ; %bb.0: ; %entry
+; GFX11-NEXT:s_load_b128 s[0:3], s[0:1], 0x24
+; GFX11-NEXT:s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:scratch_load_b32 v0, off, s2
+; GFX11-NEXT:scratch_load_u16 v1, off, s3
+; GFX11-NEXT:scratch_load_b32 v2, off, s1
+; GFX11-NEXT:s_waitcnt vmcnt(0)
+; GFX11-NEXT:v_dot2_bf16_bf16_e64_dpp v0, v2, v0, v1 quad_perm:[1,0,0,0] 
row_mask:0xf bank_mask:0xf bound_ctrl:1
+; GFX11-NEXT:scratch_store_b16 off, v0, s0
+; GFX11-NEXT:s_endpgm
 ; GISEL-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:

jayfoad wrote:

Should probably remove these GISEL-GFX11 checks since the corresponding RUN 
line is disabled.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Jay Foad via llvm-branch-commits

https://github.com/jayfoad approved this pull request.

LGTM.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Christudasan Devadasan via llvm-branch-commits

cdevadas wrote:

The latest patch optimizes the PatFrag and the patterns written further by 
using OtherPredicates. The lit test changes in the latest patch are a missed 
optimization I incorrectly introduced earlier in this PR for GFX7. It is now 
fixed and matches the default behavior with the current compiler.

https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits


@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}

shiltian wrote:

why was this changed?

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian approved this pull request.

LGTM with one question regarding the memory order

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [misexpect] Support diagnostics from frontend profile data (PR #96524)

2024-07-22 Thread Teresa Johnson via llvm-branch-commits


@@ -369,9 +369,21 @@ static bool lowerExpectIntrinsic(Function &F) {
 if (BranchInst *BI = dyn_cast(BB.getTerminator())) {
   if (handleBranchExpect(*BI))
 ExpectIntrinsicsHandled++;
+  else {
+SmallVector Weights;
+if (extractBranchWeights(*BI, Weights))
+  misexpect::checkMissingAnnotations(*BI, Weights,
+ /*IsFrontendInstr=*/false);

teresajohnson wrote:

Confused about how this relates to enabling checking for frontend 
instrumentation since all of these set that flag to false?

https://github.com/llvm/llvm-project/pull/96524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [misexpect] Support diagnostics from frontend profile data (PR #96524)

2024-07-22 Thread Teresa Johnson via llvm-branch-commits


@@ -3,7 +3,7 @@
 
 ; RUN: llvm-profdata merge %S/Inputs/misexpect-branch-correct.proftext -o 
%t.profdata
 
-; RUN: opt < %s -passes="function(lower-expect),pgo-instr-use" 
-pgo-test-profile-file=%t.profdata -pgo-missing-annotations 
-pass-remarks=missing-annotation -S 2>&1 | FileCheck %s 
--check-prefix=MISSING_ANNOTATION
+; RUN: opt < %s -passes="function(lower-expect),pgo-instr-use" 
-pgo-test-profile-file=%t.profdata -pgo-missing-annotations 
-pass-remarks=missing-annotations -S 2>&1 | FileCheck %s 
--check-prefix=MISSING_ANNOTATION

teresajohnson wrote:

Can you clarify how this change relates to the new functionality in this PR?

Also, the description of this test seems wrong? There are also some "CHECK" 
labels further below that don't have a corresponding FileCheck invocation. 
Looks like this might be leftover from a different test this is based off of.

https://github.com/llvm/llvm-project/pull/96524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits


@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}

arsenm wrote:

I'm not sure. My guess is this changed in the downstream patch change to 
seq_cst to monotonic, and this diff wasn't updated? Before this patch the 
atomicrmw wouldn't have been produced at all 

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Codegen support for constrained multi-dword sloads (PR #96163)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/96163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-22 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung created 
https://github.com/llvm/llvm-project/pull/99891

Implemented pseudo probe block matching. When matched functions have
equal pseudo probe checksums, the indices of block pseudo probes are
used to match blocks following opcode and call hash block matching.

Test Plan: Added match-blocks-with-pseudo-probes.test.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Shilei Tian via llvm-branch-commits


@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}

shiltian wrote:

it passes BuildBot tests

https://github.com/llvm/llvm-project/pull/96875
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [misexpect] Support diagnostics from frontend profile data (PR #96524)

2024-07-22 Thread Paul Kirth via llvm-branch-commits


@@ -369,9 +369,21 @@ static bool lowerExpectIntrinsic(Function &F) {
 if (BranchInst *BI = dyn_cast(BB.getTerminator())) {
   if (handleBranchExpect(*BI))
 ExpectIntrinsicsHandled++;
+  else {
+SmallVector Weights;
+if (extractBranchWeights(*BI, Weights))
+  misexpect::checkMissingAnnotations(*BI, Weights,
+ /*IsFrontendInstr=*/false);

ilovepi wrote:

oh no! You're right, this is basically dead. Let me go through this and try to 
figure out why I thought this was working as intended.

https://github.com/llvm/llvm-project/pull/96524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)

2024-07-22 Thread Mark de Wever via llvm-branch-commits

https://github.com/mordante updated 
https://github.com/llvm/llvm-project/pull/99375

>From be0f12219752be987ca6674fe384281e890520a2 Mon Sep 17 00:00:00 2001
From: Mark de Wever 
Date: Mon, 15 Jul 2024 07:45:02 +0200
Subject: [PATCH] [libc++][spaceship] Marks P1614 as complete.

Implements parts of:
- P1902R1 Missing feature-test macros 2017-2019

Completes:
- P1614R2 The Mothership has Landed
---
 libcxx/docs/FeatureTestMacroTable.rst  |  2 +-
 libcxx/docs/ReleaseNotes/19.rst|  1 +
 libcxx/docs/Status/Cxx20.rst   |  1 +
 libcxx/docs/Status/Cxx20Papers.csv |  2 +-
 libcxx/docs/Status/SpaceshipPapers.csv |  2 +-
 libcxx/include/version |  4 ++--
 .../compare.version.compile.pass.cpp   | 14 +++---
 .../version.version.compile.pass.cpp   | 14 +++---
 .../generate_feature_test_macro_components.py  |  3 +--
 9 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/libcxx/docs/FeatureTestMacroTable.rst 
b/libcxx/docs/FeatureTestMacroTable.rst
index 262da3f8937d2..a1506e115fe70 100644
--- a/libcxx/docs/FeatureTestMacroTable.rst
+++ b/libcxx/docs/FeatureTestMacroTable.rst
@@ -290,7 +290,7 @@ Status
 -- 
-
 ``__cpp_lib_syncbuf``  ``201803L``
 -- 
-
-``__cpp_lib_three_way_comparison`` ``201711L``
+``__cpp_lib_three_way_comparison`` ``201907L``
 -- 
-
 ``__cpp_lib_to_address``   ``201711L``
 -- 
-
diff --git a/libcxx/docs/ReleaseNotes/19.rst b/libcxx/docs/ReleaseNotes/19.rst
index 43767552960e4..aeef79ebfe765 100644
--- a/libcxx/docs/ReleaseNotes/19.rst
+++ b/libcxx/docs/ReleaseNotes/19.rst
@@ -39,6 +39,7 @@ Implemented Papers
 --
 
 - P1132R8 - ``out_ptr`` - a scalable output pointer abstraction
+- P1614R2 - The Mothership has Landed
 - P2637R3 - Member ``visit``
 - P2652R2 - Disallow User Specialization of ``allocator_traits``
 - P2819R2 - Add ``tuple`` protocol to ``complex``
diff --git a/libcxx/docs/Status/Cxx20.rst b/libcxx/docs/Status/Cxx20.rst
index c00d6fb237286..b76e30fbb3712 100644
--- a/libcxx/docs/Status/Cxx20.rst
+++ b/libcxx/docs/Status/Cxx20.rst
@@ -48,6 +48,7 @@ Paper Status
.. [#note-P0883.1] P0883: shared_ptr and floating-point changes weren't 
applied as they themselves aren't implemented yet.
.. [#note-P0883.2] P0883: ``ATOMIC_FLAG_INIT`` was marked deprecated in 
version 14.0, but was undeprecated with the implementation of LWG3659 in 
version 15.0.
.. [#note-P0660] P0660: The paper is implemented but the features are 
experimental and can be enabled via ``-fexperimental-library``.
+   .. [#note-P1614] P1614: ``std::strong_order(long double, long double)`` is 
partly implemented.
.. [#note-P0355] P0355: The implementation status is:
 
   * ``Calendars`` mostly done in Clang 7
diff --git a/libcxx/docs/Status/Cxx20Papers.csv 
b/libcxx/docs/Status/Cxx20Papers.csv
index 34fc5586f74d9..4015d7ad48b06 100644
--- a/libcxx/docs/Status/Cxx20Papers.csv
+++ b/libcxx/docs/Status/Cxx20Papers.csv
@@ -123,7 +123,7 @@
 "`P1522R1 `__","LWG","Iterator Difference Type and 
Integer Overflow","Cologne","|Complete|","15.0","|ranges|"
 "`P1523R1 `__","LWG","Views and Size 
Types","Cologne","|Complete|","15.0","|ranges|"
 "`P1612R1 `__","LWG","Relocate Endian's 
Specification","Cologne","|Complete|","10.0"
-"`P1614R2 `__","LWG","The Mothership has 
Landed","Cologne","|In Progress|",""
+"`P1614R2 `__","LWG","The Mothership has 
Landed","Cologne","|Complete| [#note-P1614]_","19.0"
 "`P1638R1 `__","LWG","basic_istream_view::iterator 
should not be copyable","Cologne","|Complete|","16.0","|ranges|"
 "`P1643R1 `__","LWG","Add wait/notify to 
atomic_ref","Cologne","|Complete|","19.0"
 "`P1644R0 `__","LWG","Add wait/notify to 
atomic","Cologne","",""
diff --git a/libcxx/docs/Status/SpaceshipPapers.csv 
b/libcxx/docs/Status/SpaceshipPapers.csv
index 39e1f968c1754..1ab64a9caf86a 100644
--- a/libcxx/docs/Status/SpaceshipPapers.csv
+++ b/libcxx/docs/Status/SpaceshipPapers.csv
@@ -1,5 +1,5 @@
 "Number","Name","Status","First released version"
-`P1614R2 `_,The Mothership has Landed,|In Progress|,
+`P1614R2 `_,The Mothership has 
Landed,|Complete|,19.0
 `P2404R3 `_,"Relaxing 
``equality_comparable_with``'s, ``totally_order

[llvm-branch-commits] [misexpect] Support diagnostics from frontend profile data (PR #96524)

2024-07-22 Thread Paul Kirth via llvm-branch-commits


@@ -3,7 +3,7 @@
 
 ; RUN: llvm-profdata merge %S/Inputs/misexpect-branch-correct.proftext -o 
%t.profdata
 
-; RUN: opt < %s -passes="function(lower-expect),pgo-instr-use" 
-pgo-test-profile-file=%t.profdata -pgo-missing-annotations 
-pass-remarks=missing-annotation -S 2>&1 | FileCheck %s 
--check-prefix=MISSING_ANNOTATION
+; RUN: opt < %s -passes="function(lower-expect),pgo-instr-use" 
-pgo-test-profile-file=%t.profdata -pgo-missing-annotations 
-pass-remarks=missing-annotations -S 2>&1 | FileCheck %s 
--check-prefix=MISSING_ANNOTATION

ilovepi wrote:

Oops, I thought I had removed those in the base revision. Let me rebase this 
stack and address that in #96523.

https://github.com/llvm/llvm-project/pull/96524
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for __builtin_amdgcn_global_atomic_fadd_{f32|f64} (PR #96872)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96872

>From e1da39eaef8785570f687dd2a00ae89b918252b1 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Tue, 11 Jun 2024 10:58:44 +0200
Subject: [PATCH 1/2] clang/AMDGPU: Emit atomicrmw for
 __builtin_amdgcn_global_atomic_fadd_{f32|f64}

Need to emit syncscope and new metadata to get the native instruction,
most of the time.
---
 clang/lib/CodeGen/CGBuiltin.cpp   | 39 +--
 .../CodeGenOpenCL/builtins-amdgcn-gfx11.cl|  2 +-
 .../builtins-fp-atomics-gfx12.cl  |  4 +-
 .../builtins-fp-atomics-gfx90a.cl |  4 +-
 .../builtins-fp-atomics-gfx940.cl |  4 +-
 5 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5639239359ab8..0fb45f0288d46 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -58,6 +58,7 @@
 #include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/MatrixBuilder.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/ConvertUTF.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/ScopedPrinter.h"
@@ -18743,8 +18744,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
@@ -18756,18 +18755,11 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   ArgTy = llvm::FixedVectorType::get(
   llvm::Type::getHalfTy(getLLVMContext()), 2);
   IID = Intrinsic::amdgcn_global_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -19190,7 +19182,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
   case AMDGPU::BI__builtin_amdgcn_ds_faddf:
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
-  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf: {
+  case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19206,6 +19200,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19240,8 +19236,13 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ProcessOrderScopeAMDGCN(EmitScalarExpr(E->getArg(2)),
   EmitScalarExpr(E->getArg(3)), AO, SSID);
 } else {
-  // The ds_atomic_fadd_* builtins do not have syncscope/order arguments.
-  SSID = llvm::SyncScope::System;
+  // Most of the builtins do not have syncscope/order arguments. For DS
+  // atomics the scope doesn't really matter, as they implicitly operate at
+  // workgroup scope.
+  //
+  // The global/flat cases need to use agent scope to consistently produce
+  // the native instruction instead of a cmpxchg expansion.
+  SSID = getLLVMContext().getOrInsertSyncScopeID("agent");
   AO = AtomicOrdering::SequentiallyConsistent;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
@@ -19256,6 +19257,20 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Builder.CreateAtomicRMW(BinOp, Ptr, Val, AO, SSID);
 if (Volatile)
   RMW->setVolatile(true);
+
+unsigned AddrSpace = Ptr.getType()->getAddressSpace();
+if (AddrSpace != llvm::AMDGPUAS::LOCAL_ADDRESS) {
+  // Most targets require "amdgpu.no.fine.grained.memory" to emit the 
nativ

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from {global|flat}_atomic_fadd_v2f16 builtins (PR #96873)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96873

>From 48fd0a62922171fd563f9c5fdba640153075578e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:12:59 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from
 {global|flat}_atomic_fadd_v2f16 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 20 ++-
 .../builtins-fp-atomics-gfx12.cl  |  9 ++---
 .../builtins-fp-atomics-gfx90a.cl |  2 +-
 .../builtins-fp-atomics-gfx940.cl |  3 ++-
 4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 77dadeb1f22fa..baf68c7e81569 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18744,22 +18744,15 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_global_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmin;
   break;
@@ -18779,11 +18772,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   ArgTy = llvm::Type::getFloatTy(getLLVMContext());
   IID = Intrinsic::amdgcn_flat_atomic_fadd;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
-  ArgTy = llvm::FixedVectorType::get(
-  llvm::Type::getHalfTy(getLLVMContext()), 2);
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19184,7 +19172,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_ds_fminf:
   case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19202,6 +19192,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 6b8a6d14575db..07e63a8711c7f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -48,7 +48,8 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: call <2 x half> @llvm.amdgcn.flat.atomic.fadd.v2f16.p0.v2f16(ptr 
%{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX12-LABEL:  test_flat_add_2f16
 // GFX12: flat_atomic_pk_add_f16
 half2 test_flat_add_2f16(__generic half2 *addr, half2 x) {
@@ -64,7 +65,8 @@ short2 test_flat_add_2bf16(__generic short2 *addr, short2 x) {
 }
 
 // CHECK-LABEL: test_global_add_half2
-// CHECK: call <2 x half> @llvm.amdgcn.global.atomic.fadd.v2f16.p1.v2f16(ptr 
addrspace(1) %{{.*}}, <2 x half> %{{.*}})
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(1) %{{.+}}, <2 x half> 
%{{.+}} syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX12-LABEL:  test_global_add_half2
 // GFX12:  global_atomic_pk_add_f16 v2, v[0:1], v2, off

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64} builtins (PR #96874)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96874

>From 9b0e5ee3db46e3a39f2ec5f4d5b9d5c99bbe2feb Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:15:26 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw from flat_atomic_{f32|f64}
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp | 17 ++---
 .../CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl |  6 --
 .../CodeGenOpenCL/builtins-fp-atomics-gfx940.cl |  3 ++-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index baf68c7e81569..4505eec5aa690 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18746,10 +18746,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   }
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 Intrinsic::ID IID;
 llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
 switch (BuiltinID) {
@@ -18759,19 +18757,12 @@ Value 
*CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_global_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmin;
   break;
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
   IID = Intrinsic::amdgcn_flat_atomic_fmax;
   break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  ArgTy = llvm::Type::getFloatTy(getLLVMContext());
-  IID = Intrinsic::amdgcn_flat_atomic_fadd;
-  break;
 }
 llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
 llvm::Value *Val = EmitScalarExpr(E->getArg(1));
@@ -19174,7 +19165,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19194,6 +19187,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_f64:
 case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index cd10777dbe079..02e289427238f 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -45,7 +45,8 @@ void test_global_max_f64(__global double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_add_local_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p3.f64(ptr 
addrspace(3) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(3) %{{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8{{$}}
+
 // GFX90A-LABEL:  test_flat_add_local_f64$local
 // GFX90A:  ds_add_rtn_f64
 void test_flat_add_local_f64(__local double *addr, double x){
@@ -54,7 +55,8 @@ void test_flat_add_local_f64(__local double *addr, double x){
 }
 
 // CHECK-LABEL: test_flat_global_add_f64
-// CHECK: call double @llvm.amdgcn.flat.atomic.fadd.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fadd ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") seq_cst, align 8, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_flat_global_add_f64$local
 // GFX90A:  global_atomic_add_f64
 void test_flat_global_add_f64(__global double *addr, double x){
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
index 589dcd406630d..bd9b8c7268e06 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx940.cl
@@ -10,7 +10,8 @@ typedef half  _

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16 builtins (PR #96875)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96875

>From 415dfe8521d5065f103727449adb7bc30d4193fe Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 19:34:43 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for global/flat fadd v2bf16
 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 26 ++-
 .../builtins-fp-atomics-gfx12.cl  | 24 -
 .../builtins-fp-atomics-gfx90a.cl |  6 ++---
 .../builtins-fp-atomics-gfx940.cl | 14 +++---
 4 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 4505eec5aa690..f6364ffbe9316 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18770,22 +18770,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
 return Builder.CreateCall(F, {Addr, Val});
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
-Intrinsic::ID IID;
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_global_atomic_fadd_v2bf16;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
-  IID = Intrinsic::amdgcn_flat_atomic_fadd_v2bf16;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F = CGM.getIntrinsic(IID, {Addr->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19167,7 +19151,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19189,6 +19175,8 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2f16:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
 case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
@@ -19233,7 +19221,9 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   AO = AtomicOrdering::Monotonic;
 
   // The v2bf16 builtin uses i16 instead of a natural bfloat type.
-  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16) {
+  if (BuiltinID == AMDGPU::BI__builtin_amdgcn_ds_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16 ||
+  BuiltinID == AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16) {
 llvm::Type *V2BF16Ty = FixedVectorType::get(
 llvm::Type::getBFloatTy(Builder.getContext()), 2);
 Val = Builder.CreateBitCast(Val, V2BF16Ty);
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
index 07e63a8711c7f..e8b6eb57c38d7 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx12.cl
@@ -11,7 +11,7 @@ typedef short __attribute__((ext_vector_type(2))) short2;
 
 // CHECK-LABEL: test_local_add_2bf16
 // CHECK: [[BC0:%.+]] = bitcast <2 x i16> {{.+}} to <2 x bfloat>
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x bfloat> 
[[BC0]] syncscope("agent") monotonic, align 4
+// CHECK-NEXT: [[RMW:%.+]] = atomicrmw fadd ptr addrspace(3) %{{.+}}, <2 x 
bfloat> [[BC0]] syncscope("agent") monotonic, align 4
 // CHECK-NEXT: bitcast <2 x bfloat> [[RMW]] to <2 x i16>
 
 // GFX12-LABEL:  test_local_add_2bf16
@@ -48,7 +48,7 @@ void test_local_add_2f16_noret(__local half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_flat_add_2f16
-// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !{{[0-9]+$}}
+// CHECK: [[RMW:%.+]] = atomicrmw fadd ptr %{{.+}}, <2 x half> %{{.+}} 
syncscope("agent") monotonic, align 4, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
 
 // GFX12-LABEL:  test_flat_add_2f

[llvm-branch-commits] [clang] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max f64 builtins (PR #96876)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/96876

>From 06b6158450a2af1d4b9f8cb585a9bd8597c54ed8 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 23:18:32 +0200
Subject: [PATCH] clang/AMDGPU: Emit atomicrmw for flat/global atomic min/max
 f64 builtins

---
 clang/lib/CodeGen/CGBuiltin.cpp   | 36 +--
 .../builtins-fp-atomics-gfx90a.cl | 18 ++
 2 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index f6364ffbe9316..682de60731d7d 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -18744,32 +18744,6 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
 Function *F = CGM.getIntrinsic(Intrin, { Src0->getType() });
 return Builder.CreateCall(F, { Src0, Builder.getFalse() });
   }
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
-Intrinsic::ID IID;
-llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
-switch (BuiltinID) {
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_global_atomic_fmax;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmin;
-  break;
-case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
-  IID = Intrinsic::amdgcn_flat_atomic_fmax;
-  break;
-}
-llvm::Value *Addr = EmitScalarExpr(E->getArg(0));
-llvm::Value *Val = EmitScalarExpr(E->getArg(1));
-llvm::Function *F =
-CGM.getIntrinsic(IID, {ArgTy, Addr->getType(), Val->getType()});
-return Builder.CreateCall(F, {Addr, Val});
-  }
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b64_v2i32:
   case AMDGPU::BI__builtin_amdgcn_global_load_tr_b128_v4i16:
@@ -19153,7 +19127,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f32:
   case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_f64:
   case AMDGPU::BI__builtin_amdgcn_global_atomic_fadd_v2bf16:
-  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16: {
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fadd_v2bf16:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
 llvm::AtomicRMWInst::BinOp BinOp;
 switch (BuiltinID) {
 case AMDGPU::BI__builtin_amdgcn_atomic_inc32:
@@ -19180,8 +19158,12 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned 
BuiltinID,
   BinOp = llvm::AtomicRMWInst::FAdd;
   break;
 case AMDGPU::BI__builtin_amdgcn_ds_fminf:
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmin_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmin_f64:
   BinOp = llvm::AtomicRMWInst::FMin;
   break;
+case AMDGPU::BI__builtin_amdgcn_global_atomic_fmax_f64:
+case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64:
 case AMDGPU::BI__builtin_amdgcn_ds_fmaxf:
   BinOp = llvm::AtomicRMWInst::FMax;
   break;
diff --git a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl 
b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
index 9381ce951df3e..556e553903d1a 100644
--- a/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
+++ b/clang/test/CodeGenOpenCL/builtins-fp-atomics-gfx90a.cl
@@ -27,7 +27,8 @@ void test_global_add_half2(__global half2 *addr, half2 x) {
 }
 
 // CHECK-LABEL: test_global_global_min_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmin.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmin ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_global_min_f64$local
 // GFX90A:  global_atomic_min_f64
 void test_global_global_min_f64(__global double *addr, double x){
@@ -36,7 +37,8 @@ void test_global_global_min_f64(__global double *addr, double 
x){
 }
 
 // CHECK-LABEL: test_global_max_f64
-// CHECK: call double @llvm.amdgcn.global.atomic.fmax.f64.p1.f64(ptr 
addrspace(1) %{{.*}}, double %{{.*}})
+// CHECK: = atomicrmw fmax ptr addrspace(1) {{.+}}, double %{{.+}} 
syncscope("agent") monotonic, align 8, !amdgpu.no.fine.grained.memory 
!{{[0-9]+$}}
+
 // GFX90A-LABEL:  test_global_max_f64$local
 // GFX90A:  global_atomic_max_f64
 void test_global_max_f64(__global double *addr, double x){
@@ -65,7 +67,8 @@ void test_flat_global_add_f64(__global double *addr, doub

[llvm-branch-commits] [llvm] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics (PR #97050)

2024-07-22 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/97050

>From 2541faa229c0099ff05eb8e451772d9410005463 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 27 Jun 2024 16:32:48 +0200
Subject: [PATCH] AMDGPU: Remove flat/global atomic fadd v2bf16 intrinsics

These are now fully covered by atomicrmw.
---
 llvm/include/llvm/IR/IntrinsicsAMDGPU.td  |   4 -
 llvm/lib/IR/AutoUpgrade.cpp   |  14 +-
 llvm/lib/Target/AMDGPU/AMDGPUInstructions.td  |   2 -
 .../Target/AMDGPU/AMDGPURegisterBankInfo.cpp  |   2 -
 .../Target/AMDGPU/AMDGPUSearchableTables.td   |   2 -
 llvm/lib/Target/AMDGPU/FLATInstructions.td|   2 -
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   6 +-
 llvm/test/Bitcode/amdgcn-atomic.ll|  22 ++
 .../AMDGPU/GlobalISel/fp-atomics-gfx940.ll| 106 -
 .../test/CodeGen/AMDGPU/fp-atomics-gfx1200.ll | 218 --
 llvm/test/CodeGen/AMDGPU/fp-atomics-gfx940.ll | 193 
 11 files changed, 33 insertions(+), 538 deletions(-)

diff --git a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td 
b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
index ab2620fdcf6b3..119281ca6103a 100644
--- a/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
+++ b/llvm/include/llvm/IR/IntrinsicsAMDGPU.td
@@ -2955,10 +2955,6 @@ multiclass AMDGPUMFp8SmfmacIntrinsic {
 def NAME#"_"#kind : AMDGPUMFp8SmfmacIntrinsic;
 }
 
-// bf16 atomics use v2i16 argument since there is no bf16 data type in the 
llvm.
-def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUAtomicRtn;
-def int_amdgcn_flat_atomic_fadd_v2bf16   : AMDGPUAtomicRtn;
-
 defset list AMDGPUMFMAIntrinsics940 = {
 def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic;
 def int_amdgcn_mfma_i32_32x32x16_i8 : AMDGPUMfmaIntrinsic;
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 53de9eef516b3..f566a0e3c3043 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -1034,7 +1034,9 @@ static bool upgradeIntrinsicFunction1(Function *F, 
Function *&NewFn,
   }
 
   if (Name.starts_with("ds.fadd") || Name.starts_with("ds.fmin") ||
-  Name.starts_with("ds.fmax")) {
+  Name.starts_with("ds.fmax") ||
+  Name.starts_with("global.atomic.fadd.v2bf16") ||
+  Name.starts_with("flat.atomic.fadd.v2bf16")) {
 // Replaced with atomicrmw fadd/fmin/fmax, so there's no new
 // declaration.
 NewFn = nullptr;
@@ -4042,7 +4044,9 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   .StartsWith("ds.fmin", AtomicRMWInst::FMin)
   .StartsWith("ds.fmax", AtomicRMWInst::FMax)
   .StartsWith("atomic.inc.", AtomicRMWInst::UIncWrap)
-  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap);
+  .StartsWith("atomic.dec.", AtomicRMWInst::UDecWrap)
+  .StartsWith("global.atomic.fadd", AtomicRMWInst::FAdd)
+  .StartsWith("flat.atomic.fadd", AtomicRMWInst::FAdd);
 
   unsigned NumOperands = CI->getNumOperands();
   if (NumOperands < 3) // Malformed bitcode.
@@ -4097,8 +4101,10 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef Name, 
CallBase *CI,
   Builder.CreateAtomicRMW(RMWOp, Ptr, Val, std::nullopt, Order, SSID);
 
   if (PtrTy->getAddressSpace() != 3) {
-RMW->setMetadata("amdgpu.no.fine.grained.memory",
- MDNode::get(F->getContext(), {}));
+MDNode *EmptyMD = MDNode::get(F->getContext(), {});
+RMW->setMetadata("amdgpu.no.fine.grained.memory", EmptyMD);
+if (RMWOp == AtomicRMWInst::FAdd && RetTy->isFloatTy())
+  RMW->setMetadata("amdgpu.ignore.denormal.mode", EmptyMD);
   }
 
   if (IsVolatile)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
index c6dbc58395e48..db8b44149cf47 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
@@ -620,12 +620,10 @@ multiclass local_addr_space_atomic_op {
 
 defm int_amdgcn_flat_atomic_fadd : noret_op;
 defm int_amdgcn_flat_atomic_fadd : flat_addr_space_atomic_op;
-defm int_amdgcn_flat_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_flat_atomic_fmin : noret_op;
 defm int_amdgcn_flat_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_fadd : global_addr_space_atomic_op;
 defm int_amdgcn_flat_atomic_fadd : global_addr_space_atomic_op;
-defm int_amdgcn_global_atomic_fadd_v2bf16 : noret_op;
 defm int_amdgcn_global_atomic_fmin : noret_op;
 defm int_amdgcn_global_atomic_fmax : noret_op;
 defm int_amdgcn_global_atomic_csub : noret_op;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index aa329a58547f3..546c0a238e430 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -4898,8 +4898,6 @@ AMDGPURegisterBankInfo::getInstrMapping(const 
MachineInstr &MI) const {
 case Intrinsic::amdgcn_flat_atomic_fmax:
 case Intrinsic

[llvm-branch-commits] [clang-tidy] Add FixIts for libc namespace macros (PR #99681)

2024-07-22 Thread Paul Kirth via llvm-branch-commits


@@ -41,8 +50,26 @@ void ImplementationInNamespaceCheck::check(
 
   // Enforce that the namespace is the result of macro expansion
   if (Result.SourceManager->isMacroBodyExpansion(NS->getLocation()) == false) {
-diag(NS->getLocation(), "the outermost namespace should be the '%0' macro")
-<< RequiredNamespaceDeclMacroName;
+auto DB = diag(NS->getLocation(),
+   "the outermost namespace should be the '%0' macro")
+  << RequiredNamespaceDeclMacroName;
+
+// TODO: Determine how to split inline namespaces correctly in the 
FixItHint
+//
+// We can't easily replace LIBC_NAMEPACE::inner::namespace { with
+//
+// namespace LIBC_NAMEPACE_DECL {
+//   namespace inner::namespace {
+//
+// For now, just update the simple case w/ LIBC_NAMEPACE_DECL
+if (!NS->isInlineNamespace())
+  DB << FixItHint::CreateReplacement(NS->getLocation(),
+ RequiredNamespaceDeclMacroName);
+
+DB << IncludeInserter.createIncludeInsertion(
+Result.SourceManager->getFileID(NS->getBeginLoc()),
+NamespaceMacroHeader);

ilovepi wrote:

hmm, yeah. That's a bit awkward.

https://github.com/llvm/llvm-project/pull/99681
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tidy] Add FixIts for libc namespace macros (PR #99681)

2024-07-22 Thread Paul Kirth via llvm-branch-commits


@@ -1,4 +1,6 @@
-// RUN: %check_clang_tidy %s llvmlibc-implementation-in-namespace %t
+// RUN: %check_clang_tidy %s llvmlibc-implementation-in-namespace %t -fix

ilovepi wrote:

ack. I saw those running, but I saw `-fix` on a lot of other tests outside of 
llvmlibc. I'm guessing those are probalby older tests that predate that logic? 
either way, I'll address this shortly.

https://github.com/llvm/llvm-project/pull/99681
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Transform "coro_must_elide" calls to switch ABI coroutines to the `noalloc` variant (PR #99285)

2024-07-22 Thread Wei Wang via llvm-branch-commits


@@ -968,8 +969,8 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
   // it's been modified since.
   MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
   RequireAnalysisPass()));
-
   MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
+  MainCGPipeline.addPass(CoroAnnotationElidePass());

apolloww wrote:

Yes, the pipeline will need to be adjusted for the new pass.

https://github.com/llvm/llvm-project/pull/99285
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LLVM][Coroutines] Create `.noalloc` variant of switch ABI coroutine ramp functions during CoroSplit (PR #99283)

2024-07-22 Thread Wei Wang via llvm-branch-commits


@@ -1455,6 +1462,64 @@ struct SwitchCoroutineSplitter {
 setCoroInfo(F, Shape, Clones);
   }
 
+  static Function *createNoAllocVariant(Function &F, coro::Shape &Shape,
+SmallVectorImpl &Clones) {
+auto *OrigFnTy = F.getFunctionType();
+auto OldParams = OrigFnTy->params();
+
+SmallVector NewParams;
+NewParams.reserve(OldParams.size() + 1);
+for (Type *T : OldParams) {
+  NewParams.push_back(T);
+}
+NewParams.push_back(PointerType::getUnqual(Shape.FrameTy));
+
+auto *NewFnTy = FunctionType::get(OrigFnTy->getReturnType(), NewParams,
+  OrigFnTy->isVarArg());
+Function *NoAllocF =
+Function::Create(NewFnTy, F.getLinkage(), F.getName() + ".noalloc");

apolloww wrote:

ThinLTO can import global values (variables and functions) from another 
modules. There would be a follow-up patch updating the pipeline so that 
coroutine passes are moved from pre-link to post-link. A previous attempt 
#90310 was blocked due to some conflict with Asan. I'll have a new version 
later. 

https://github.com/llvm/llvm-project/pull/99283
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tidy] Add FixIts for libc namespace macros (PR #99681)

2024-07-22 Thread Piotr Zegar via llvm-branch-commits

https://github.com/PiotrZSL commented:

Except pointed out issues, looks fine for me.

https://github.com/llvm/llvm-project/pull/99681
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-22 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/99891

>From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 22 Jul 2024 11:56:23 -0700
Subject: [PATCH] Changed assignment of profiles with pseudo probe index

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++
 .../X86/match-blocks-with-pseudo-probes.test  | 25 ++
 2 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 4105f626fb5b6..c135ee5ff4837 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -195,11 +195,15 @@ class StaleMatcher {
   void init(const std::vector &Blocks,
 const std::vector &Hashes,
 const std::vector &CallHashes,
-std::optional YamlBFGUID) {
+const std::unordered_map>
+IndexToBinaryPseudoProbes,
+const std::unordered_map
+BinaryPseudoProbeToBlock,
+const uint64_t YamlBFGUID) {
 assert(Blocks.size() == Hashes.size() &&
Hashes.size() == CallHashes.size() &&
"incorrect matcher initialization");
-
 for (size_t I = 0; I < Blocks.size(); I++) {
   FlowBlock *Block = Blocks[I];
   uint16_t OpHash = Hashes[I].OpcodeHash;
@@ -209,6 +213,8 @@ class StaleMatcher {
 std::make_pair(Hashes[I], Block));
   this->Blocks.push_back(Block);
 }
+this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes;
+this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock;
 this->YamlBFGUID = YamlBFGUID;
   }
 
@@ -234,10 +240,14 @@ class StaleMatcher {
   using HashBlockPairType = std::pair;
   std::unordered_map> OpHashToBlocks;
   std::unordered_map> 
CallHashToBlocks;
-  std::vector Blocks;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  std::vector Blocks;
   // If the pseudo probe checksums of the profiled and binary functions are
   // equal, then the YamlBF's GUID is defined and used to match blocks.
-  std::optional YamlBFGUID;
+  uint64_t YamlBFGUID;
 
   // Uses OpcodeHash to find the most similar block for a given hash.
   const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const {
@@ -284,7 +294,7 @@ class StaleMatcher {
 // Searches for the pseudo probe attached to the matched function's block,
 // ignoring pseudo probes attached to function calls and inlined functions'
 // blocks.
-outs() << "match with pseudo probes\n";
+std::vector BlockPseudoProbes;
 for (const auto &PseudoProbe : PseudoProbes) {
   // Ensures that pseudo probe information belongs to the appropriate
   // function and not an inlined function.
@@ -293,11 +303,30 @@ class StaleMatcher {
   // Skips pseudo probes attached to function calls.
   if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
 continue;
-  assert(PseudoProbe.Index < Blocks.size() &&
- "pseudo probe index out of range");
-  return Blocks[PseudoProbe.Index];
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
 }
-return nullptr;
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index < Blocks.size() && "Invalid pseudo probe index");
+
+auto It = IndexToBinaryPseudoProbes.find(Index);
+assert(It != IndexToBinaryPseudoProbes.end() &&
+   "All blocks should have a pseudo probe");
+if (It->second.size() > 1)
+  return nullptr;
+
+const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0];
+auto BinaryPseudoProbeIt = 
BinaryPseudoProbeToBlock.find(BinaryPseudoProbe);
+assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() &&
+   "All binary pseudo probes should belong a binary basic block");
+
+return BinaryPseudoProbeIt->second;
   }
 };
 
@@ -491,6 +520,11 @@ size_t matchWeightsByHashes(
   std::vector CallHashes;
   std::vector Blocks;
   std::vector BlendedHashes;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder();
   for (uint64_t I = 0; I < BlockOrder.size(); I++) {
 const BinaryBasicBlock *BB = BlockOrder[I];
 assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock");
@@ -510,9 +544,27 @@ size_t matchWeightsByHashes(
 Blocks.push_back(&Func.Blocks[I + 1]);
 BlendedBlockHash BlendedHash(BB->getHash());
 BlendedHashes.push_back(BlendedHash);
+if (PseudoProbeDecoder) {
+  const AddressProbesMap &ProbeMap =
+  PseudoProbeDecoder->getAddres

[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-22 Thread Shaw Young via llvm-branch-commits

https://github.com/shawbyoung updated 
https://github.com/llvm/llvm-project/pull/99891

>From 0274f697376264c2d77816190f9a434f64e79089 Mon Sep 17 00:00:00 2001
From: shawbyoung 
Date: Mon, 22 Jul 2024 11:56:23 -0700
Subject: [PATCH 1/2] Changed assignment of profiles with pseudo probe index

Created using spr 1.3.4
---
 bolt/lib/Profile/StaleProfileMatching.cpp | 85 +++
 .../X86/match-blocks-with-pseudo-probes.test  | 25 ++
 2 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/bolt/lib/Profile/StaleProfileMatching.cpp 
b/bolt/lib/Profile/StaleProfileMatching.cpp
index 4105f626fb5b6..c135ee5ff4837 100644
--- a/bolt/lib/Profile/StaleProfileMatching.cpp
+++ b/bolt/lib/Profile/StaleProfileMatching.cpp
@@ -195,11 +195,15 @@ class StaleMatcher {
   void init(const std::vector &Blocks,
 const std::vector &Hashes,
 const std::vector &CallHashes,
-std::optional YamlBFGUID) {
+const std::unordered_map>
+IndexToBinaryPseudoProbes,
+const std::unordered_map
+BinaryPseudoProbeToBlock,
+const uint64_t YamlBFGUID) {
 assert(Blocks.size() == Hashes.size() &&
Hashes.size() == CallHashes.size() &&
"incorrect matcher initialization");
-
 for (size_t I = 0; I < Blocks.size(); I++) {
   FlowBlock *Block = Blocks[I];
   uint16_t OpHash = Hashes[I].OpcodeHash;
@@ -209,6 +213,8 @@ class StaleMatcher {
 std::make_pair(Hashes[I], Block));
   this->Blocks.push_back(Block);
 }
+this->IndexToBinaryPseudoProbes = IndexToBinaryPseudoProbes;
+this->BinaryPseudoProbeToBlock = BinaryPseudoProbeToBlock;
 this->YamlBFGUID = YamlBFGUID;
   }
 
@@ -234,10 +240,14 @@ class StaleMatcher {
   using HashBlockPairType = std::pair;
   std::unordered_map> OpHashToBlocks;
   std::unordered_map> 
CallHashToBlocks;
-  std::vector Blocks;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  std::vector Blocks;
   // If the pseudo probe checksums of the profiled and binary functions are
   // equal, then the YamlBF's GUID is defined and used to match blocks.
-  std::optional YamlBFGUID;
+  uint64_t YamlBFGUID;
 
   // Uses OpcodeHash to find the most similar block for a given hash.
   const FlowBlock *matchWithOpcodes(BlendedBlockHash BlendedHash) const {
@@ -284,7 +294,7 @@ class StaleMatcher {
 // Searches for the pseudo probe attached to the matched function's block,
 // ignoring pseudo probes attached to function calls and inlined functions'
 // blocks.
-outs() << "match with pseudo probes\n";
+std::vector BlockPseudoProbes;
 for (const auto &PseudoProbe : PseudoProbes) {
   // Ensures that pseudo probe information belongs to the appropriate
   // function and not an inlined function.
@@ -293,11 +303,30 @@ class StaleMatcher {
   // Skips pseudo probes attached to function calls.
   if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
 continue;
-  assert(PseudoProbe.Index < Blocks.size() &&
- "pseudo probe index out of range");
-  return Blocks[PseudoProbe.Index];
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
 }
-return nullptr;
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index < Blocks.size() && "Invalid pseudo probe index");
+
+auto It = IndexToBinaryPseudoProbes.find(Index);
+assert(It != IndexToBinaryPseudoProbes.end() &&
+   "All blocks should have a pseudo probe");
+if (It->second.size() > 1)
+  return nullptr;
+
+const MCDecodedPseudoProbe *BinaryPseudoProbe = It->second[0];
+auto BinaryPseudoProbeIt = 
BinaryPseudoProbeToBlock.find(BinaryPseudoProbe);
+assert(BinaryPseudoProbeIt != BinaryPseudoProbeToBlock.end() &&
+   "All binary pseudo probes should belong a binary basic block");
+
+return BinaryPseudoProbeIt->second;
   }
 };
 
@@ -491,6 +520,11 @@ size_t matchWeightsByHashes(
   std::vector CallHashes;
   std::vector Blocks;
   std::vector BlendedHashes;
+  std::unordered_map>
+  IndexToBinaryPseudoProbes;
+  std::unordered_map
+  BinaryPseudoProbeToBlock;
+  const MCPseudoProbeDecoder *PseudoProbeDecoder = BC.getPseudoProbeDecoder();
   for (uint64_t I = 0; I < BlockOrder.size(); I++) {
 const BinaryBasicBlock *BB = BlockOrder[I];
 assert(BB->getHash() != 0 && "empty hash of BinaryBasicBlock");
@@ -510,9 +544,27 @@ size_t matchWeightsByHashes(
 Blocks.push_back(&Func.Blocks[I + 1]);
 BlendedBlockHash BlendedHash(BB->getHash());
 BlendedHashes.push_back(BlendedHash);
+if (PseudoProbeDecoder) {
+  const AddressProbesMap &ProbeMap =
+  PseudoProbeDecoder->getAd

[llvm-branch-commits] [libc] 052f002 - Revert "[libc] New HeaderGen Switch Flip (#99929)"

2024-07-22 Thread via llvm-branch-commits

Author: RoseZhang03
Date: 2024-07-22T15:25:52-07:00
New Revision: 052f002d33e247b9e00994a100bf32531fefe615

URL: 
https://github.com/llvm/llvm-project/commit/052f002d33e247b9e00994a100bf32531fefe615
DIFF: 
https://github.com/llvm/llvm-project/commit/052f002d33e247b9e00994a100bf32531fefe615.diff

LOG: Revert "[libc] New HeaderGen Switch Flip (#99929)"

This reverts commit 93eb9ec368817e63e2e8a2ef567bc23211acb776.

Added: 


Modified: 
libc/CMakeLists.txt

Removed: 




diff  --git a/libc/CMakeLists.txt b/libc/CMakeLists.txt
index 45cca17562d26..6e0760724d963 100644
--- a/libc/CMakeLists.txt
+++ b/libc/CMakeLists.txt
@@ -73,7 +73,7 @@ if(LIBC_BUILD_GPU_LOADER OR (LLVM_LIBC_GPU_BUILD AND NOT 
LLVM_RUNTIMES_BUILD))
   add_subdirectory(utils/gpu)
 endif()
 
-option(LIBC_USE_NEW_HEADER_GEN "Generate header files using new headergen 
instead of the old one" ON)
+option(LIBC_USE_NEW_HEADER_GEN "Generate header files using new headergen 
instead of the old one" OFF)
 
 set(NEED_LIBC_HDRGEN FALSE)
 if(NOT LLVM_RUNTIMES_BUILD)



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT][NFC] Track fragment relationships using EquivalenceClasses (PR #99979)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/99979

Three-way splitting can create references between split fragments (warm
to cold or vice versa) that are not handled by
`isChildOf/isParentOf/isChildOrParentOf`. Generalize fragment
relationships to allow checking if two functions belong to one group,
potentially in presence of ICF which can join multiple groups.

Test Plan: NFC for existing tests



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Support more than two jump table parents (PR #99988)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/99988

Multi-way splitting can cause multiple fragments to access the same jump
table. Relax the assumption that a jump table can only have up to two
parents.

Test Plan: added bolt/test/X86/three-way-split-jt.s



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][NFC] Track fragment relationships using EquivalenceClasses (PR #99979)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/99979

>From f6478e36a962843329c519ba35ad2a132ffd8c9e Mon Sep 17 00:00:00 2001
From: Amir Ayupov 
Date: Mon, 22 Jul 2024 16:34:02 -0700
Subject: [PATCH] fix getOrCreateJumpTable

Created using spr 1.3.4
---
 bolt/lib/Core/BinaryContext.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bolt/lib/Core/BinaryContext.cpp b/bolt/lib/Core/BinaryContext.cpp
index bdfd91417a696..874cdd26ce6ea 100644
--- a/bolt/lib/Core/BinaryContext.cpp
+++ b/bolt/lib/Core/BinaryContext.cpp
@@ -841,7 +841,7 @@ BinaryContext::getOrCreateJumpTable(BinaryFunction 
&Function, uint64_t Address,
 // Prevent associating a jump table to a specific fragment twice.
 // This simple check arises from the assumption: no more than 2 fragments.
 if (JT->Parents.size() == 1 && JT->Parents[0] != &Function) {
-  assert(JT->Parents[0]->isParentOrChildOf(Function) &&
+  assert(areRelatedFragments(JT->Parents[0], &Function) &&
  "cannot re-use jump table of a different function");
   // Duplicate the entry for the parent function for easy access
   JT->Parents.push_back(&Function);

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Support more than two jump table parents (PR #99988)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/99988


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [BOLT] Support more than two jump table parents (PR #99988)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits

https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/99988


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++][spaceship] Marks P1614 as complete. (PR #99375)

2024-07-22 Thread via llvm-branch-commits

h-vetinari wrote:

This now closes #100018

https://github.com/llvm/llvm-project/pull/99375
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][NFC] Track fragment relationships using EquivalenceClasses (PR #99979)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits


@@ -241,6 +242,10 @@ class BinaryContext {
   /// Function fragments to skip.
   std::unordered_set FragmentsToSkip;
 
+  /// Fragment equivalence classes to query belonging to the same "family" in
+  /// presence of multiple fragments/multiple parents.
+  EquivalenceClasses FragmentClasses;

aaupov wrote:

It's a set that only keeps a copy of entry (a pointer in our case):

https://github.com/llvm/llvm-project/blob/73ffeeab12d54211fd838d6ff988d111369ea196/llvm/include/llvm/ADT/EquivalenceClasses.h#L26-L35

Moreover, we will only insert fragments into it, so it shouldn't be too 
expensive even for the largest binaries (we split up to 10s of thousands of 
functions).

https://github.com/llvm/llvm-project/pull/99979
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match blocks with pseudo probes (PR #99891)

2024-07-22 Thread Amir Ayupov via llvm-branch-commits


@@ -266,6 +287,47 @@ class StaleMatcher {
 }
 return BestBlock;
   }
+  // Uses pseudo probe information to attach the profile to the appropriate
+  // block.
+  const FlowBlock *matchWithPseudoProbes(
+  const std::vector &PseudoProbes) const {
+// Searches for the pseudo probe attached to the matched function's block,
+// ignoring pseudo probes attached to function calls and inlined functions'
+// blocks.
+std::vector BlockPseudoProbes;
+for (const auto &PseudoProbe : PseudoProbes) {
+  // Ensures that pseudo probe information belongs to the appropriate
+  // function and not an inlined function.
+  if (PseudoProbe.GUID != YamlBFGUID)
+continue;
+  // Skips pseudo probes attached to function calls.
+  if (PseudoProbe.Type != static_cast(PseudoProbeType::Block))
+continue;
+
+  BlockPseudoProbes.push_back(&PseudoProbe);
+}
+
+// Returns nullptr if there is not a 1:1 mapping of the yaml block pseudo
+// probe and binary pseudo probe.
+if (BlockPseudoProbes.size() == 0 || BlockPseudoProbes.size() > 1)
+  return nullptr;
+
+uint64_t Index = BlockPseudoProbes[0]->Index;
+assert(Index <= Blocks.size() && "Invalid pseudo probe index");

aaupov wrote:

Invalid assertion

https://github.com/llvm/llvm-project/pull/99891
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] a96af6c - Revert "[clang-tidy] fix misc-const-correctness to work with function-try-blo…"

2024-07-22 Thread via llvm-branch-commits

Author: Piotr Zegar
Date: 2024-07-23T08:46:14+02:00
New Revision: a96af6c18ea45269c4d2e1fd762a14d763be0358

URL: 
https://github.com/llvm/llvm-project/commit/a96af6c18ea45269c4d2e1fd762a14d763be0358
DIFF: 
https://github.com/llvm/llvm-project/commit/a96af6c18ea45269c4d2e1fd762a14d763be0358.diff

LOG: Revert "[clang-tidy] fix misc-const-correctness to work with 
function-try-blo…"

This reverts commit cd9e42cb0f2fdf1834e2e6ad2befdba0d7cc84b5.

Added: 


Modified: 
clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.cpp
clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.h
clang-tools-extra/docs/ReleaseNotes.rst
clang-tools-extra/test/clang-tidy/checkers/misc/const-correctness-values.cpp

Removed: 




diff  --git a/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.cpp 
b/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.cpp
index e20cf6fbcb55a..8b500de0c028c 100644
--- a/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.cpp
+++ b/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.cpp
@@ -93,12 +93,13 @@ void ConstCorrectnessCheck::registerMatchers(MatchFinder 
*Finder) {
   // shall be run.
   const auto FunctionScope =
   functionDecl(
-  hasBody(stmt(forEachDescendant(
-   declStmt(containsAnyDeclaration(
-LocalValDecl.bind("local-value")),
-unless(has(decompositionDecl(
-   .bind("decl-stmt")))
-  .bind("scope")))
+  hasBody(
+  compoundStmt(forEachDescendant(
+   declStmt(containsAnyDeclaration(
+LocalValDecl.bind("local-value")),
+unless(has(decompositionDecl(
+   .bind("decl-stmt")))
+  .bind("scope")))
   .bind("function-decl");
 
   Finder->addMatcher(FunctionScope, this);
@@ -108,7 +109,7 @@ void ConstCorrectnessCheck::registerMatchers(MatchFinder 
*Finder) {
 enum class VariableCategory { Value, Reference, Pointer };
 
 void ConstCorrectnessCheck::check(const MatchFinder::MatchResult &Result) {
-  const auto *LocalScope = Result.Nodes.getNodeAs("scope");
+  const auto *LocalScope = Result.Nodes.getNodeAs("scope");
   const auto *Variable = Result.Nodes.getNodeAs("local-value");
   const auto *Function = Result.Nodes.getNodeAs("function-decl");
 
@@ -197,7 +198,7 @@ void ConstCorrectnessCheck::check(const 
MatchFinder::MatchResult &Result) {
   }
 }
 
-void ConstCorrectnessCheck::registerScope(const Stmt *LocalScope,
+void ConstCorrectnessCheck::registerScope(const CompoundStmt *LocalScope,
   ASTContext *Context) {
   auto &Analyzer = ScopesCache[LocalScope];
   if (!Analyzer)

diff  --git a/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.h 
b/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.h
index bba060e555d00..08ffde524522a 100644
--- a/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.h
+++ b/clang-tools-extra/clang-tidy/misc/ConstCorrectnessCheck.h
@@ -32,10 +32,10 @@ class ConstCorrectnessCheck : public ClangTidyCheck {
   void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
 
 private:
-  void registerScope(const Stmt *LocalScope, ASTContext *Context);
+  void registerScope(const CompoundStmt *LocalScope, ASTContext *Context);
 
   using MutationAnalyzer = std::unique_ptr;
-  llvm::DenseMap ScopesCache;
+  llvm::DenseMap ScopesCache;
   llvm::DenseSet TemplateDiagnosticsCache;
 
   const bool AnalyzeValues;

diff  --git a/clang-tools-extra/docs/ReleaseNotes.rst 
b/clang-tools-extra/docs/ReleaseNotes.rst
index 083b098d05d4a..0b2c04c23761c 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -381,8 +381,7 @@ Changes in existing checks
 - Improved :doc:`misc-const-correctness
   ` check by avoiding infinite 
recursion
   for recursive functions with forwarding reference parameters and reference
-  variables which refer to themselves. Also adapted the check to work with
-  function-try-blocks.
+  variables which refer to themselves.
 
 - Improved :doc:`misc-definitions-in-headers
   ` check by replacing the local

diff  --git 
a/clang-tools-extra/test/clang-tidy/checkers/misc/const-correctness-values.cpp 
b/clang-tools-extra/test/clang-tidy/checkers/misc/const-correctness-values.cpp
index 2af4bfb0bd449..cb6bfcc1dccba 100644
--- 
a/clang-tools-extra/test/clang-tidy/checkers/misc/const-correctness-values.cpp
+++ 
b/clang-tools-extra/test/clang-tidy/checkers/misc/const-correctness-values.cpp
@@ -56,15 +56,6 @@ void some_function(double np_arg0, wchar_t np_arg1) {
   np_local6--;
 }
 
-int function_try_block() try {
-  int p_local0 = 0;
-  // CHECK-MESSAGES: [[@LINE-1]]:3: warning: var