[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,7 @@
+// RUN: llvm-mc -triple aarch64-elf -filetype=obj %s -o - | llvm-objdump -r - 
| FileCheck %s
+
+// Test that PATCHINST appears after JUMP26.
+// CHECK:  R_AARCH64_JUMP26
+// CHECK-NEXT: R_AARCH64_PATCHINST
+.reloc ., R_AARCH64_PATCHINST, ds
+b f1

MaskRay wrote:

Improve the test to test that with more than one fragments and more than one 
PATCHINST, the relocations are still ordered.
```
.reloc ., R_AARCH64_PATCHINST, ds
b f1
.balign 8
.reloc ., R_AARCH64_PATCHINST, ds
b f2
```

https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits


@@ -0,0 +1,87 @@
+# RUN: rm -rf %t && split-file %s %t

MaskRay wrote:

consider adding `&& cd %t` so that we can remove `%t/` below, which clutter up 
the commands...

https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay edited 
https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits

https://github.com/MaskRay approved this pull request.

The assembler part (BinaryFormat / Target/AArch64 changes) look good. The 
linker change should be made separate. But thank for combining this in a single 
PR, making the full picture clear:)

https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits


@@ -61,6 +61,7 @@ ELF_RELOC(R_AARCH64_LD64_GOT_LO12_NC,0x138)
 ELF_RELOC(R_AARCH64_LD64_GOTPAGE_LO15,   0x139)
 ELF_RELOC(R_AARCH64_PLT32,   0x13a)
 ELF_RELOC(R_AARCH64_GOTPCREL32,  0x13b)
+ELF_RELOC(R_AARCH64_PATCHINST,   0x13c)

MaskRay wrote:

Also add a test to tools/llvm-readobj/ELF/reloc-types-aarch64.test

https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits

MaskRay wrote:

> The R_AARCH64_PATCHINST relocation type is to support deactivation symbols. 
> For more information, see the RFC: 
> [discourse.llvm.org/t/rfc-deactivation-symbols/85556](https://discourse.llvm.org/t/rfc-deactivation-symbols/85556)
> 
> An AArch64 psABI extension proposal has been made: 
> [ARM-software/abi-aa#329](https://github.com/ARM-software/abi-aa/issues/329)

The RFC is quite long and doesn't come up with an assembly example. Adding some 
description of how the relocation works will help future readers.

https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] ELF: Introduce R_AARCH64_PATCHINST relocation type. (PR #133534)

2025-08-09 Thread Fangrui Song via llvm-branch-commits

MaskRay wrote:

Need an aarch64 maintainer's signoff on the llvm part. 


https://github.com/llvm/llvm-project/pull/133534
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)

2025-08-09 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/152420

>From df2474e2319c466bebb18c48b0ba6c12a8429772 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 6 Aug 2025 17:43:35 -0700
Subject: [PATCH] [ir] MD_prof is not UB-implying

---
 llvm/lib/IR/Metadata.cpp  |  4 ++
 .../Transforms/LICM/hoist-phi-metadata.ll | 44 +++
 2 files changed, 48 insertions(+)

diff --git a/llvm/lib/IR/Metadata.cpp b/llvm/lib/IR/Metadata.cpp
index 1157cbe6bbc1b..ba838cd2793ce 100644
--- a/llvm/lib/IR/Metadata.cpp
+++ b/llvm/lib/IR/Metadata.cpp
@@ -57,6 +57,8 @@
 
 using namespace llvm;
 
+extern cl::opt ProfcheckDisableMetadataFixes;
+
 MetadataAsValue::MetadataAsValue(Type *Ty, Metadata *MD)
 : Value(Ty, MetadataAsValueVal), MD(MD) {
   track();
@@ -1678,6 +1680,8 @@ void 
Instruction::dropUnknownNonDebugMetadata(ArrayRef KnownIDs) {
 
   // A DIAssignID attachment is debug metadata, don't drop it.
   KnownSet.insert(LLVMContext::MD_DIAssignID);
+  if (!ProfcheckDisableMetadataFixes)
+KnownSet.insert(LLVMContext::MD_prof);
 
   Value::eraseMetadataIf([&KnownSet](unsigned MDKind, MDNode *Node) {
 return !KnownSet.count(MDKind);
diff --git a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll 
b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
index 6f64bf7d7c875..255302c966034 100644
--- a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
+++ b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
@@ -45,6 +45,46 @@ end:
   ret void
 }
 
+declare i32 @getv()
+
+; indirect.goto.dest2 should get hoisted, and that should not result
+; in a loss of profiling info
+define i32 @test19(i1 %cond, i1 %cond2, ptr %address, i32 %v1) nounwind {
+; CHECK-LABEL: define i32 @test19
+; CHECK-SAME: (i1 [[COND:%.*]], i1 [[COND2:%.*]], ptr [[ADDRESS:%.*]], i32 
[[V1:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST:%.*]] = select i1 [[COND]], ptr 
blockaddress(@test19, [[EXIT:%.*]]), ptr [[ADDRESS]], !prof [[PROF9:![0-9]+]]
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST2:%.*]] = select i1 [[COND2]], ptr 
blockaddress(@test19, [[EXIT]]), ptr [[ADDRESS]], !prof [[PROF10:![0-9]+]]
+; CHECK-NEXT:br label [[L0:%.*]]
+; CHECK:   L0:
+; CHECK-NEXT:[[V2:%.*]] = call i32 @getv()
+; CHECK-NEXT:[[SINKABLE:%.*]] = mul i32 [[V1]], [[V2]]
+; CHECK-NEXT:[[SINKABLE2:%.*]] = add i32 [[V1]], [[V2]]
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST]], [label [[L1:%.*]], 
label %exit]
+; CHECK:   L1:
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST2]], [label [[L0]], label 
%exit]
+; CHECK:   exit:
+; CHECK-NEXT:[[R:%.*]] = phi i32 [ [[SINKABLE]], [[L0]] ], [ 
[[SINKABLE2]], [[L1]] ]
+; CHECK-NEXT:ret i32 [[R]]
+;
+entry:
+  br label %L0
+L0:
+  %indirect.goto.dest = select i1 %cond, ptr blockaddress(@test19, %exit), ptr 
%address, !prof !10
+  %v2 = call i32 @getv()
+  %sinkable = mul i32 %v1, %v2
+  %sinkable2 = add i32 %v1, %v2
+  indirectbr ptr %indirect.goto.dest, [label %L1, label %exit]
+
+L1:
+  %indirect.goto.dest2 = select i1 %cond2, ptr blockaddress(@test19, %exit), 
ptr %address, !prof !11
+  indirectbr ptr %indirect.goto.dest2, [label %L0, label %exit]
+
+exit:
+  %r = phi i32 [%sinkable, %L0], [%sinkable2, %L1]
+  ret i32 %r
+}
+
 !llvm.module.flags = !{!2, !3}
 
 !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1)
@@ -57,6 +97,10 @@ end:
 !7 = !DILocation(line: 3, column: 22, scope: !4)
 !8 = !{!"branch_weights", i32 5, i32 7}
 !9 = !{!"branch_weights", i32 13, i32 11}
+!10 = !{!"branch_weights", i32 101, i32 189}
+!11 = !{!"branch_weights", i32 67, i32 1}
+;.
+; CHECK: attributes #[[ATTR0]] = { nounwind }
 ;.
 ; CHECK: [[META0:![0-9]+]] = !{i32 7, !"Dwarf Version", i32 5}
 ; CHECK: [[META1:![0-9]+]] = !{i32 2, !"Debug Info Version", i32 3}

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)

2025-08-09 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/152420

>From df2474e2319c466bebb18c48b0ba6c12a8429772 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Wed, 6 Aug 2025 17:43:35 -0700
Subject: [PATCH] [ir] MD_prof is not UB-implying

---
 llvm/lib/IR/Metadata.cpp  |  4 ++
 .../Transforms/LICM/hoist-phi-metadata.ll | 44 +++
 2 files changed, 48 insertions(+)

diff --git a/llvm/lib/IR/Metadata.cpp b/llvm/lib/IR/Metadata.cpp
index 1157cbe6bbc1b..ba838cd2793ce 100644
--- a/llvm/lib/IR/Metadata.cpp
+++ b/llvm/lib/IR/Metadata.cpp
@@ -57,6 +57,8 @@
 
 using namespace llvm;
 
+extern cl::opt ProfcheckDisableMetadataFixes;
+
 MetadataAsValue::MetadataAsValue(Type *Ty, Metadata *MD)
 : Value(Ty, MetadataAsValueVal), MD(MD) {
   track();
@@ -1678,6 +1680,8 @@ void 
Instruction::dropUnknownNonDebugMetadata(ArrayRef KnownIDs) {
 
   // A DIAssignID attachment is debug metadata, don't drop it.
   KnownSet.insert(LLVMContext::MD_DIAssignID);
+  if (!ProfcheckDisableMetadataFixes)
+KnownSet.insert(LLVMContext::MD_prof);
 
   Value::eraseMetadataIf([&KnownSet](unsigned MDKind, MDNode *Node) {
 return !KnownSet.count(MDKind);
diff --git a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll 
b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
index 6f64bf7d7c875..255302c966034 100644
--- a/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
+++ b/llvm/test/Transforms/LICM/hoist-phi-metadata.ll
@@ -45,6 +45,46 @@ end:
   ret void
 }
 
+declare i32 @getv()
+
+; indirect.goto.dest2 should get hoisted, and that should not result
+; in a loss of profiling info
+define i32 @test19(i1 %cond, i1 %cond2, ptr %address, i32 %v1) nounwind {
+; CHECK-LABEL: define i32 @test19
+; CHECK-SAME: (i1 [[COND:%.*]], i1 [[COND2:%.*]], ptr [[ADDRESS:%.*]], i32 
[[V1:%.*]]) #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST:%.*]] = select i1 [[COND]], ptr 
blockaddress(@test19, [[EXIT:%.*]]), ptr [[ADDRESS]], !prof [[PROF9:![0-9]+]]
+; CHECK-NEXT:[[INDIRECT_GOTO_DEST2:%.*]] = select i1 [[COND2]], ptr 
blockaddress(@test19, [[EXIT]]), ptr [[ADDRESS]], !prof [[PROF10:![0-9]+]]
+; CHECK-NEXT:br label [[L0:%.*]]
+; CHECK:   L0:
+; CHECK-NEXT:[[V2:%.*]] = call i32 @getv()
+; CHECK-NEXT:[[SINKABLE:%.*]] = mul i32 [[V1]], [[V2]]
+; CHECK-NEXT:[[SINKABLE2:%.*]] = add i32 [[V1]], [[V2]]
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST]], [label [[L1:%.*]], 
label %exit]
+; CHECK:   L1:
+; CHECK-NEXT:indirectbr ptr [[INDIRECT_GOTO_DEST2]], [label [[L0]], label 
%exit]
+; CHECK:   exit:
+; CHECK-NEXT:[[R:%.*]] = phi i32 [ [[SINKABLE]], [[L0]] ], [ 
[[SINKABLE2]], [[L1]] ]
+; CHECK-NEXT:ret i32 [[R]]
+;
+entry:
+  br label %L0
+L0:
+  %indirect.goto.dest = select i1 %cond, ptr blockaddress(@test19, %exit), ptr 
%address, !prof !10
+  %v2 = call i32 @getv()
+  %sinkable = mul i32 %v1, %v2
+  %sinkable2 = add i32 %v1, %v2
+  indirectbr ptr %indirect.goto.dest, [label %L1, label %exit]
+
+L1:
+  %indirect.goto.dest2 = select i1 %cond2, ptr blockaddress(@test19, %exit), 
ptr %address, !prof !11
+  indirectbr ptr %indirect.goto.dest2, [label %L0, label %exit]
+
+exit:
+  %r = phi i32 [%sinkable, %L0], [%sinkable2, %L1]
+  ret i32 %r
+}
+
 !llvm.module.flags = !{!2, !3}
 
 !0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !1)
@@ -57,6 +97,10 @@ end:
 !7 = !DILocation(line: 3, column: 22, scope: !4)
 !8 = !{!"branch_weights", i32 5, i32 7}
 !9 = !{!"branch_weights", i32 13, i32 11}
+!10 = !{!"branch_weights", i32 101, i32 189}
+!11 = !{!"branch_weights", i32 67, i32 1}
+;.
+; CHECK: attributes #[[ATTR0]] = { nounwind }
 ;.
 ; CHECK: [[META0:![0-9]+]] = !{i32 7, !"Dwarf Version", i32 5}
 ; CHECK: [[META1:![0-9]+]] = !{i32 2, !"Debug Info Version", i32 3}

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset (PR #139413)

2025-08-09 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/139413

>From a2dec95d11a68c7911eef777ad78b07aa747bef5 Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Sat, 10 May 2025 15:35:50 -0700
Subject: [PATCH 1/2] remove fixme

Created using spr 1.3.6-beta.1
---
 .../test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll | 1 -
 1 file changed, 1 deletion(-)

diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll 
b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index 074c3cf7f3bbf..538145a11c733 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -278,7 +278,6 @@ define <2 x i32> @ptrtoaddr_vec(<2 x ptr addrspace(7)> 
%ptr) {
 }
 
 ;; Check that we extend the offset to i160 instead of reinterpreting all bits.
-;; FIXME: this is not currently correct.
 define i160 @ptrtoaddr_ext(ptr addrspace(7) %ptr) {
 ; CHECK-LABEL: define i160 @ptrtoaddr_ext
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[PTR:%.*]]) #[[ATTR0]] {

>From d48e4abb04e112a195f6673e092f05ab964af70b Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Wed, 11 Jun 2025 11:09:15 -0700
Subject: [PATCH 2/2] address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 9c7dd7540db7d..26f1703a2f9b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1959,8 +1959,7 @@ PtrParts 
SplitPtrStructs::visitPtrToAddrInst(PtrToAddrInst &PA) {
   IRB.SetInsertPoint(&PA);
 
   auto [Rsrc, Off] = getPtrParts(Ptr);
-  Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false,
- PA.getName() + ".off");
+  Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false);
   copyMetadata(Res, &PA);
   Res->takeName(&PA);
   SplitUsers.insert(&PA);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPULowerBufferFatPointers] Handle ptrtoaddr by extending the offset (PR #139413)

2025-08-09 Thread Alexander Richardson via llvm-branch-commits

https://github.com/arichardson updated 
https://github.com/llvm/llvm-project/pull/139413

>From a2dec95d11a68c7911eef777ad78b07aa747bef5 Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Sat, 10 May 2025 15:35:50 -0700
Subject: [PATCH 1/2] remove fixme

Created using spr 1.3.6-beta.1
---
 .../test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll | 1 -
 1 file changed, 1 deletion(-)

diff --git a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll 
b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
index 074c3cf7f3bbf..538145a11c733 100644
--- a/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
+++ b/llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-pointer-ops.ll
@@ -278,7 +278,6 @@ define <2 x i32> @ptrtoaddr_vec(<2 x ptr addrspace(7)> 
%ptr) {
 }
 
 ;; Check that we extend the offset to i160 instead of reinterpreting all bits.
-;; FIXME: this is not currently correct.
 define i160 @ptrtoaddr_ext(ptr addrspace(7) %ptr) {
 ; CHECK-LABEL: define i160 @ptrtoaddr_ext
 ; CHECK-SAME: ({ ptr addrspace(8), i32 } [[PTR:%.*]]) #[[ATTR0]] {

>From d48e4abb04e112a195f6673e092f05ab964af70b Mon Sep 17 00:00:00 2001
From: Alex Richardson 
Date: Wed, 11 Jun 2025 11:09:15 -0700
Subject: [PATCH 2/2] address review comment

Created using spr 1.3.6-beta.1
---
 llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
index 9c7dd7540db7d..26f1703a2f9b1 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp
@@ -1959,8 +1959,7 @@ PtrParts 
SplitPtrStructs::visitPtrToAddrInst(PtrToAddrInst &PA) {
   IRB.SetInsertPoint(&PA);
 
   auto [Rsrc, Off] = getPtrParts(Ptr);
-  Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false,
- PA.getName() + ".off");
+  Value *Res = IRB.CreateIntCast(Off, PA.getType(), /*isSigned=*/false);
   copyMetadata(Res, &PA);
   Res->takeName(&PA);
   SplitUsers.insert(&PA);

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911) (PR #151680)

2025-08-09 Thread via llvm-branch-commits

mikulas-patocka wrote:

> > Seems like we are still waiting for confirmation on this one?
> 
> My understanding from @mikulas-patocka in #106846 is that there was no 
> regression on current main after merging over a week ago, so I think we 
> should be good to go.
> 
> Not sure what the remaining time-line is, but it would be good to make sure 
> the regression is fixed in the release.

There was a regression - the pointless move of %r14 to %r15 and back and 
loading %rsi from the stack that I mentioned in 
https://github.com/llvm/llvm-project/issues/106846 is actually a regression 
that was recently added to git. With older clang-22 from Debian Sid (version 
1:22~++20250731080150+be449d6b6587-1~exp1), these pointless instructions are 
not generated.

https://github.com/llvm/llvm-project/pull/151680
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits

https://github.com/kevinsala updated 
https://github.com/llvm/llvm-project/pull/152831

>From fa3c7425ae9e5ffea83841f2be61b0f494b99038 Mon Sep 17 00:00:00 2001
From: Kevin Sala 
Date: Fri, 8 Aug 2025 11:25:14 -0700
Subject: [PATCH 1/2] [OpenMP][Offload] Add offload runtime support for
 dyn_groupprivate clause

---
 offload/DeviceRTL/include/DeviceTypes.h   |   4 +
 offload/DeviceRTL/include/Interface.h |   2 +-
 offload/DeviceRTL/include/State.h |   2 +-
 offload/DeviceRTL/src/Kernel.cpp  |  14 +-
 offload/DeviceRTL/src/State.cpp   |  48 +-
 offload/include/Shared/APITypes.h |   6 +-
 offload/include/Shared/Environment.h  |   4 +-
 offload/include/device.h  |   3 +
 offload/include/omptarget.h   |   7 +-
 offload/libomptarget/OpenMP/API.cpp   |  14 ++
 offload/libomptarget/device.cpp   |   6 +
 offload/libomptarget/exports  |   1 +
 .../amdgpu/dynamic_hsa/hsa_ext_amd.h  |   1 +
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  34 +++--
 .../common/include/PluginInterface.h  |  33 +++-
 .../common/src/PluginInterface.cpp|  86 ---
 .../plugins-nextgen/cuda/dynamic_cuda/cuda.h  |   1 +
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  37 +++--
 offload/plugins-nextgen/host/src/rtl.cpp  |   4 +-
 .../offloading/dyn_groupprivate_strict.cpp| 141 ++
 openmp/runtime/src/include/omp.h.var  |  10 ++
 openmp/runtime/src/kmp_csupport.cpp   |   9 ++
 openmp/runtime/src/kmp_stub.cpp   |  16 ++
 23 files changed, 418 insertions(+), 65 deletions(-)
 create mode 100644 offload/test/offloading/dyn_groupprivate_strict.cpp

diff --git a/offload/DeviceRTL/include/DeviceTypes.h 
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..a43b506d6879e 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -163,4 +163,8 @@ typedef enum omp_allocator_handle_t {
 
 ///}
 
+enum omp_access_t {
+  omp_access_cgroup = 0,
+};
+
 #endif
diff --git a/offload/DeviceRTL/include/Interface.h 
b/offload/DeviceRTL/include/Interface.h
index c4bfaaa2404b4..672afea206785 100644
--- a/offload/DeviceRTL/include/Interface.h
+++ b/offload/DeviceRTL/include/Interface.h
@@ -222,7 +222,7 @@ struct KernelEnvironmentTy;
 int8_t __kmpc_is_spmd_exec_mode();
 
 int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
-   KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+   KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
 
 void __kmpc_target_deinit();
 
diff --git a/offload/DeviceRTL/include/State.h 
b/offload/DeviceRTL/include/State.h
index db396dae6e445..17c3c6f2d3e42 100644
--- a/offload/DeviceRTL/include/State.h
+++ b/offload/DeviceRTL/include/State.h
@@ -116,7 +116,7 @@ extern Local ThreadStates;
 
 /// Initialize the state machinery. Must be called by all threads.
 void init(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
-  KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+  KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
 
 /// Return the kernel and kernel launch environment associated with the current
 /// kernel. The former is static and contains compile time information that
diff --git a/offload/DeviceRTL/src/Kernel.cpp b/offload/DeviceRTL/src/Kernel.cpp
index 467e44a65276c..58e9a09105a76 100644
--- a/offload/DeviceRTL/src/Kernel.cpp
+++ b/offload/DeviceRTL/src/Kernel.cpp
@@ -34,8 +34,8 @@ enum OMPTgtExecModeFlags : unsigned char {
 };
 
 static void
-inititializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
-KernelLaunchEnvironmentTy &KernelLaunchEnvironment) {
+initializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
+  KernelLaunchEnvironmentTy *KernelLaunchEnvironment) {
   // Order is important here.
   synchronize::init(IsSPMD);
   mapping::init(IsSPMD);
@@ -80,17 +80,17 @@ extern "C" {
 /// \param Ident   Source location identification, can be NULL.
 ///
 int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
-   KernelLaunchEnvironmentTy &KernelLaunchEnvironment) 
{
+   KernelLaunchEnvironmentTy *KernelLaunchEnvironment) 
{
   ConfigurationEnvironmentTy &Configuration = KernelEnvironment.Configuration;
   bool IsSPMD = Configuration.ExecMode & OMP_TGT_EXEC_MODE_SPMD;
   bool UseGenericStateMachine = Configuration.UseGenericStateMachine;
   if (IsSPMD) {
-inititializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
-KernelLaunchEnvironment);
+initializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
+  KernelLaunchEnvironment);
 synchronize::threadsAligned(atomic::relaxed);
   } else {
-inititializeRuntime(/*IsSPMD=*/false, KernelEnvironment,
-KernelLaunchEnv

[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- 
offload/test/offloading/dyn_groupprivate_strict.cpp 
offload/DeviceRTL/include/DeviceTypes.h offload/DeviceRTL/include/Interface.h 
offload/DeviceRTL/include/State.h offload/DeviceRTL/src/Kernel.cpp 
offload/DeviceRTL/src/State.cpp offload/include/Shared/APITypes.h 
offload/include/Shared/Environment.h offload/include/device.h 
offload/include/omptarget.h offload/libomptarget/OpenMP/API.cpp 
offload/libomptarget/device.cpp 
offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h 
offload/plugins-nextgen/amdgpu/src/rtl.cpp 
offload/plugins-nextgen/common/include/PluginInterface.h 
offload/plugins-nextgen/common/src/PluginInterface.cpp 
offload/plugins-nextgen/cuda/dynamic_cuda/cuda.h 
offload/plugins-nextgen/cuda/src/rtl.cpp 
offload/plugins-nextgen/host/src/rtl.cpp openmp/runtime/src/kmp_csupport.cpp 
openmp/runtime/src/kmp_stub.cpp
``





View the diff from clang-format here.


``diff
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp 
b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 48e677d06..9751169b0 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -3442,7 +3442,8 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy 
&GenericDevice,
   }
 
   // Increase to the requested dynamic memory size for the device if needed.
-  DynBlockMemSize = std::max(DynBlockMemSize, 
GenericDevice.getDynamicMemorySize());
+  DynBlockMemSize =
+  std::max(DynBlockMemSize, GenericDevice.getDynamicMemorySize());
 
   // Push the kernel launch into the stream.
   return Stream->pushKernelLaunch(*this, AllArgs, NumThreads, NumBlocks,
diff --git a/offload/plugins-nextgen/cuda/src/rtl.cpp 
b/offload/plugins-nextgen/cuda/src/rtl.cpp
index bd1cedf56..b052197e2 100644
--- a/offload/plugins-nextgen/cuda/src/rtl.cpp
+++ b/offload/plugins-nextgen/cuda/src/rtl.cpp
@@ -1323,7 +1323,8 @@ Error CUDAKernelTy::launchImpl(GenericDeviceTy 
&GenericDevice,
 GenericDevice.Plugin.getRPCServer().Thread->notify();
 
   // Increase to the requested dynamic memory size for the device if needed.
-  DynBlockMemSize = std::max(DynBlockMemSize, 
GenericDevice.getDynamicMemorySize());
+  DynBlockMemSize =
+  std::max(DynBlockMemSize, GenericDevice.getDynamicMemorySize());
 
   // In case we require more memory than the current limit.
   if (DynBlockMemSize >= MaxDynBlockMemSize) {

``




https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits

https://github.com/kevinsala edited 
https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits


@@ -163,4 +163,8 @@ typedef enum omp_allocator_handle_t {
 
 ///}
 
+enum omp_access_t {

kevinsala wrote:

Actually there another value in the standard. Added and documented.

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits


@@ -4515,6 +4515,15 @@ void omp_free(void *ptr, omp_allocator_handle_t 
allocator) {
 }
 /* end of OpenMP 5.1 Memory Management routines */
 
+void *omp_get_dyn_groupprivate_ptr(size_t offset, int *is_fallback,
+   omp_access_t access_group) {
+  if (is_fallback != NULL)

kevinsala wrote:

This interface was discussed in the OpenMP language committe, and it's the 
version that was accepted. Having an optional out argument, we can pass 
nullptr, or no parameter in the C++ and Fortran versions.

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits

https://github.com/kevinsala edited 
https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Kevin Sala Penades via llvm-branch-commits

https://github.com/kevinsala updated 
https://github.com/llvm/llvm-project/pull/152831

>From fa3c7425ae9e5ffea83841f2be61b0f494b99038 Mon Sep 17 00:00:00 2001
From: Kevin Sala 
Date: Fri, 8 Aug 2025 11:25:14 -0700
Subject: [PATCH 1/2] [OpenMP][Offload] Add offload runtime support for
 dyn_groupprivate clause

---
 offload/DeviceRTL/include/DeviceTypes.h   |   4 +
 offload/DeviceRTL/include/Interface.h |   2 +-
 offload/DeviceRTL/include/State.h |   2 +-
 offload/DeviceRTL/src/Kernel.cpp  |  14 +-
 offload/DeviceRTL/src/State.cpp   |  48 +-
 offload/include/Shared/APITypes.h |   6 +-
 offload/include/Shared/Environment.h  |   4 +-
 offload/include/device.h  |   3 +
 offload/include/omptarget.h   |   7 +-
 offload/libomptarget/OpenMP/API.cpp   |  14 ++
 offload/libomptarget/device.cpp   |   6 +
 offload/libomptarget/exports  |   1 +
 .../amdgpu/dynamic_hsa/hsa_ext_amd.h  |   1 +
 offload/plugins-nextgen/amdgpu/src/rtl.cpp|  34 +++--
 .../common/include/PluginInterface.h  |  33 +++-
 .../common/src/PluginInterface.cpp|  86 ---
 .../plugins-nextgen/cuda/dynamic_cuda/cuda.h  |   1 +
 offload/plugins-nextgen/cuda/src/rtl.cpp  |  37 +++--
 offload/plugins-nextgen/host/src/rtl.cpp  |   4 +-
 .../offloading/dyn_groupprivate_strict.cpp| 141 ++
 openmp/runtime/src/include/omp.h.var  |  10 ++
 openmp/runtime/src/kmp_csupport.cpp   |   9 ++
 openmp/runtime/src/kmp_stub.cpp   |  16 ++
 23 files changed, 418 insertions(+), 65 deletions(-)
 create mode 100644 offload/test/offloading/dyn_groupprivate_strict.cpp

diff --git a/offload/DeviceRTL/include/DeviceTypes.h 
b/offload/DeviceRTL/include/DeviceTypes.h
index 2e5d92380f040..a43b506d6879e 100644
--- a/offload/DeviceRTL/include/DeviceTypes.h
+++ b/offload/DeviceRTL/include/DeviceTypes.h
@@ -163,4 +163,8 @@ typedef enum omp_allocator_handle_t {
 
 ///}
 
+enum omp_access_t {
+  omp_access_cgroup = 0,
+};
+
 #endif
diff --git a/offload/DeviceRTL/include/Interface.h 
b/offload/DeviceRTL/include/Interface.h
index c4bfaaa2404b4..672afea206785 100644
--- a/offload/DeviceRTL/include/Interface.h
+++ b/offload/DeviceRTL/include/Interface.h
@@ -222,7 +222,7 @@ struct KernelEnvironmentTy;
 int8_t __kmpc_is_spmd_exec_mode();
 
 int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
-   KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+   KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
 
 void __kmpc_target_deinit();
 
diff --git a/offload/DeviceRTL/include/State.h 
b/offload/DeviceRTL/include/State.h
index db396dae6e445..17c3c6f2d3e42 100644
--- a/offload/DeviceRTL/include/State.h
+++ b/offload/DeviceRTL/include/State.h
@@ -116,7 +116,7 @@ extern Local ThreadStates;
 
 /// Initialize the state machinery. Must be called by all threads.
 void init(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
-  KernelLaunchEnvironmentTy &KernelLaunchEnvironment);
+  KernelLaunchEnvironmentTy *KernelLaunchEnvironment);
 
 /// Return the kernel and kernel launch environment associated with the current
 /// kernel. The former is static and contains compile time information that
diff --git a/offload/DeviceRTL/src/Kernel.cpp b/offload/DeviceRTL/src/Kernel.cpp
index 467e44a65276c..58e9a09105a76 100644
--- a/offload/DeviceRTL/src/Kernel.cpp
+++ b/offload/DeviceRTL/src/Kernel.cpp
@@ -34,8 +34,8 @@ enum OMPTgtExecModeFlags : unsigned char {
 };
 
 static void
-inititializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
-KernelLaunchEnvironmentTy &KernelLaunchEnvironment) {
+initializeRuntime(bool IsSPMD, KernelEnvironmentTy &KernelEnvironment,
+  KernelLaunchEnvironmentTy *KernelLaunchEnvironment) {
   // Order is important here.
   synchronize::init(IsSPMD);
   mapping::init(IsSPMD);
@@ -80,17 +80,17 @@ extern "C" {
 /// \param Ident   Source location identification, can be NULL.
 ///
 int32_t __kmpc_target_init(KernelEnvironmentTy &KernelEnvironment,
-   KernelLaunchEnvironmentTy &KernelLaunchEnvironment) 
{
+   KernelLaunchEnvironmentTy *KernelLaunchEnvironment) 
{
   ConfigurationEnvironmentTy &Configuration = KernelEnvironment.Configuration;
   bool IsSPMD = Configuration.ExecMode & OMP_TGT_EXEC_MODE_SPMD;
   bool UseGenericStateMachine = Configuration.UseGenericStateMachine;
   if (IsSPMD) {
-inititializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
-KernelLaunchEnvironment);
+initializeRuntime(/*IsSPMD=*/true, KernelEnvironment,
+  KernelLaunchEnvironment);
 synchronize::threadsAligned(atomic::relaxed);
   } else {
-inititializeRuntime(/*IsSPMD=*/false, KernelEnvironment,
-KernelLaunchEnv

[llvm-branch-commits] [clang] [clang-tools-extra] [lldb] [PATCH 5/7] [clang] NNS improvement: getOriginalDecl changes (PR #149747)

2025-08-09 Thread Matheus Izvekov via llvm-branch-commits

https://github.com/mizvekov closed 
https://github.com/llvm/llvm-project/pull/149747
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Use xvperm.w for cross-lane access within a single vector (PR #151634)

2025-08-09 Thread via llvm-branch-commits

https://github.com/zhaoqi5 updated 
https://github.com/llvm/llvm-project/pull/151634

>From f759464ee797830c998d66d1076d9896c5a1 Mon Sep 17 00:00:00 2001
From: Qi Zhao 
Date: Fri, 1 Aug 2025 11:30:19 +0800
Subject: [PATCH 1/2] [LoongArch] Use xvperm.w for cross-lane access within a
 single vector

---
 .../LoongArch/LoongArchISelLowering.cpp   | 44 +++
 .../lasx/shuffle-as-permute-and-shuffle.ll| 18 ++--
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp 
b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
index 597650c8229a7..6aa848ca7bd07 100644
--- a/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
+++ b/llvm/lib/Target/LoongArch/LoongArchISelLowering.cpp
@@ -1832,6 +1832,48 @@ static SDValue lowerVECTOR_SHUFFLE_XVSHUF4I(const SDLoc 
&DL, ArrayRef Mask,
   return lowerVECTOR_SHUFFLE_VSHUF4I(DL, Mask, VT, V1, V2, DAG);
 }
 
+/// Lower VECTOR_SHUFFLE into XVPERM (if possible).
+static SDValue lowerVECTOR_SHUFFLE_XVPERM(const SDLoc &DL, ArrayRef Mask,
+  MVT VT, SDValue V1, SDValue V2,
+  SelectionDAG &DAG) {
+  // LoongArch LASX only have XVPERM_W.
+  if (Mask.size() != 8 || (VT != MVT::v8i32 && VT != MVT::v8f32))
+return SDValue();
+
+  unsigned NumElts = VT.getVectorNumElements();
+  unsigned HalfSize = NumElts / 2;
+  bool FrontLo = true, FrontHi = true;
+  bool BackLo = true, BackHi = true;
+
+  auto inRange = [](int val, int low, int high) {
+return (val == -1) || (val >= low && val < high);
+  };
+
+  for (unsigned i = 0; i < HalfSize; ++i) {
+int Fronti = Mask[i];
+int Backi = Mask[i + HalfSize];
+
+FrontLo &= inRange(Fronti, 0, HalfSize);
+FrontHi &= inRange(Fronti, HalfSize, NumElts);
+BackLo &= inRange(Backi, 0, HalfSize);
+BackHi &= inRange(Backi, HalfSize, NumElts);
+  }
+
+  // If both the lower and upper 128-bit parts access only one half of the
+  // vector (either lower or upper), avoid using xvperm.w. The latency of
+  // xvperm.w(3) is higher than using xvshuf(1) and xvori(1).
+  if ((FrontLo && (BackLo || BackHi)) || (FrontHi && (BackLo || BackHi)))
+return SDValue();
+
+  SmallVector Masks;
+  for (unsigned i = 0; i < NumElts; ++i)
+Masks.push_back(Mask[i] == -1 ? DAG.getUNDEF(MVT::i64)
+  : DAG.getConstant(Mask[i], DL, MVT::i64));
+  SDValue MaskVec = DAG.getBuildVector(MVT::v8i32, DL, Masks);
+
+  return DAG.getNode(LoongArchISD::XVPERM, DL, VT, V1, MaskVec);
+}
+
 /// Lower VECTOR_SHUFFLE into XVPACKEV (if possible).
 static SDValue lowerVECTOR_SHUFFLE_XVPACKEV(const SDLoc &DL, ArrayRef 
Mask,
 MVT VT, SDValue V1, SDValue V2,
@@ -2235,6 +2277,8 @@ static SDValue lower256BitShuffle(const SDLoc &DL, 
ArrayRef Mask, MVT VT,
   return Result;
 if ((Result = lowerVECTOR_SHUFFLE_XVSHUF4I(DL, NewMask, VT, V1, V2, DAG)))
   return Result;
+if ((Result = lowerVECTOR_SHUFFLE_XVPERM(DL, NewMask, VT, V1, V2, DAG)))
+  return Result;
 if ((Result = lowerVECTOR_SHUFFLEAsLanePermuteAndShuffle(DL, NewMask, VT,
  V1, V2, DAG)))
   return Result;
diff --git a/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll 
b/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
index fed085843485a..5f76d9951df9c 100644
--- a/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
+++ b/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
@@ -61,13 +61,8 @@ define <8 x i32> @shuffle_v8i32(<8 x i32> %a) {
 ; CHECK-LABEL: shuffle_v8i32:
 ; CHECK:   # %bb.0:
 ; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI4_0)
-; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI4_0)
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI4_1)
-; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI4_1)
-; CHECK-NEXT:xvpermi.d $xr3, $xr0, 78
-; CHECK-NEXT:xvshuf.d $xr2, $xr0, $xr3
-; CHECK-NEXT:xvshuf.d $xr1, $xr2, $xr0
-; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI4_0)
+; CHECK-NEXT:xvperm.w $xr0, $xr0, $xr1
 ; CHECK-NEXT:ret
   %shuffle = shufflevector <8 x i32> %a, <8 x i32> poison, <8 x i32> 
   ret <8 x i32> %shuffle
@@ -117,13 +112,8 @@ define <8 x float> @shuffle_v8f32(<8 x float> %a) {
 ; CHECK-LABEL: shuffle_v8f32:
 ; CHECK:   # %bb.0:
 ; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI8_0)
-; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI8_0)
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI8_1)
-; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI8_1)
-; CHECK-NEXT:xvpermi.d $xr3, $xr0, 78
-; CHECK-NEXT:xvshuf.d $xr2, $xr0, $xr3
-; CHECK-NEXT:xvshuf.d $xr1, $xr2, $xr0
-; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI8_0)
+; CHECK-NEXT:xvperm.w $xr0, $xr0, $xr1
 ; CHECK-NEXT:ret
   %shuffle = shuf

[llvm-branch-commits] [llvm] [LoongArch] Pre-commit tests for shuffle visiting same lane. NFC (PR #151633)

2025-08-09 Thread via llvm-branch-commits

https://github.com/zhaoqi5 closed 
https://github.com/llvm/llvm-project/pull/151633
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Use xvperm.w for cross-lane access within a single vector (PR #151634)

2025-08-09 Thread via llvm-branch-commits

https://github.com/zhaoqi5 edited 
https://github.com/llvm/llvm-project/pull/151634
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix updated 
https://github.com/llvm/llvm-project/pull/152871

>From 99fcd032df293e5f3404e7ce275ce43891f7c9f3 Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Sat, 9 Aug 2025 20:15:12 +0300
Subject: [PATCH] [libc][math] Refactor cosf16 implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cosf16.h |  28 +
 libc/src/__support/math/CMakeLists.txt|  27 +
 libc/src/__support/math/cosf16.h  | 106 ++
 .../math}/sincosf16_utils.h   |   6 +-
 libc/src/math/generic/CMakeLists.txt  |  31 +
 libc/src/math/generic/cosf16.cpp  |  81 +
 libc/src/math/generic/cospif16.cpp|   3 +-
 libc/src/math/generic/sinf16.cpp  |   3 +-
 libc/src/math/generic/sinpif16.cpp|   3 +-
 libc/src/math/generic/tanf16.cpp  |   3 +-
 libc/src/math/generic/tanpif16.cpp|   3 +-
 libc/test/shared/CMakeLists.txt   |   1 +
 libc/test/shared/shared_math_test.cpp |   2 +-
 .../llvm-project-overlay/libc/BUILD.bazel |  48 +---
 15 files changed, 217 insertions(+), 129 deletions(-)
 create mode 100644 libc/shared/math/cosf16.h
 create mode 100644 libc/src/__support/math/cosf16.h
 rename libc/src/{math/generic => __support/math}/sincosf16_utils.h (97%)

diff --git a/libc/shared/math.h b/libc/shared/math.h
index 0c11640101563..a7edb0811a380 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -34,6 +34,7 @@
 #include "math/cbrtf.h"
 #include "math/cos.h"
 #include "math/cosf.h"
+#include "math/cosf16.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cosf16.h b/libc/shared/math/cosf16.h
new file mode 100644
index 0..8a19285c5755b
--- /dev/null
+++ b/libc/shared/math/cosf16.h
@@ -0,0 +1,28 @@
+//===-- Shared cosf16 function --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_COSF16_H
+#define LLVM_LIBC_SHARED_MATH_COSF16_H
+
+#include "shared/libc_common.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "src/__support/math/cosf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cosf16;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SHARED_MATH_COSF16_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index 2cd064591e976..f4a8ee0fbb41c 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -390,6 +390,23 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cosf16
+  HDRS
+cosf16.h
+  DEPENDS
+.sincosf16_utils
+libc.hdr.errno_macros
+libc.hdr.fenv_macros
+libc.src.__support.FPUtil.cast
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.except_value_utils
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.macros.optimization
+libc.src.__support.macros.properties.types
+)
+
 add_header_library(
   erff
   HDRS
@@ -699,3 +716,13 @@ add_header_library(
 libc.src.__support.FPUtil.polyeval
 libc.src.__support.common
 )
+
+add_header_library(
+  sincosf16_utils
+  HDRS
+sincosf16_utils.h
+  DEPENDS
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.FPUtil.nearest_integer
+libc.src.__support.common
+)
diff --git a/libc/src/__support/math/cosf16.h b/libc/src/__support/math/cosf16.h
new file mode 100644
index 0..50c9a8f765c2a
--- /dev/null
+++ b/libc/src/__support/math/cosf16.h
@@ -0,0 +1,106 @@
+//===-- Implementation header for cosf16 *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+#define LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "sincosf16_utils.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix created 
https://github.com/llvm/llvm-project/pull/152871

None

>From 7031956b4e52cac9e294673d5a96ea4e289325ed Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Sat, 9 Aug 2025 20:15:12 +0300
Subject: [PATCH] [libc][math] Refactor cosf16 implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cosf16.h |  28 +
 libc/src/__support/math/CMakeLists.txt|  27 +
 libc/src/__support/math/cosf16.h  | 107 ++
 .../math}/sincosf16_utils.h   |   6 +-
 libc/src/math/generic/CMakeLists.txt  |  31 +
 libc/src/math/generic/cosf16.cpp  |  81 +
 libc/src/math/generic/cospif16.cpp|   3 +-
 libc/src/math/generic/sinf16.cpp  |   3 +-
 libc/src/math/generic/sinpif16.cpp|   3 +-
 libc/src/math/generic/tanf16.cpp  |   3 +-
 libc/src/math/generic/tanpif16.cpp|   3 +-
 libc/test/shared/CMakeLists.txt   |   1 +
 libc/test/shared/shared_math_test.cpp |   2 +-
 .../llvm-project-overlay/libc/BUILD.bazel |  48 +---
 15 files changed, 218 insertions(+), 129 deletions(-)
 create mode 100644 libc/shared/math/cosf16.h
 create mode 100644 libc/src/__support/math/cosf16.h
 rename libc/src/{math/generic => __support/math}/sincosf16_utils.h (97%)

diff --git a/libc/shared/math.h b/libc/shared/math.h
index 0c11640101563..a7edb0811a380 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -34,6 +34,7 @@
 #include "math/cbrtf.h"
 #include "math/cos.h"
 #include "math/cosf.h"
+#include "math/cosf16.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cosf16.h b/libc/shared/math/cosf16.h
new file mode 100644
index 0..8a19285c5755b
--- /dev/null
+++ b/libc/shared/math/cosf16.h
@@ -0,0 +1,28 @@
+//===-- Shared cosf16 function --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_COSF16_H
+#define LLVM_LIBC_SHARED_MATH_COSF16_H
+
+#include "shared/libc_common.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "src/__support/math/cosf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cosf16;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SHARED_MATH_COSF16_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index 2cd064591e976..f4a8ee0fbb41c 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -390,6 +390,23 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cosf16
+  HDRS
+cosf16.h
+  DEPENDS
+.sincosf16_utils
+libc.hdr.errno_macros
+libc.hdr.fenv_macros
+libc.src.__support.FPUtil.cast
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.except_value_utils
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.macros.optimization
+libc.src.__support.macros.properties.types
+)
+
 add_header_library(
   erff
   HDRS
@@ -699,3 +716,13 @@ add_header_library(
 libc.src.__support.FPUtil.polyeval
 libc.src.__support.common
 )
+
+add_header_library(
+  sincosf16_utils
+  HDRS
+sincosf16_utils.h
+  DEPENDS
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.FPUtil.nearest_integer
+libc.src.__support.common
+)
diff --git a/libc/src/__support/math/cosf16.h b/libc/src/__support/math/cosf16.h
new file mode 100644
index 0..e013a8751d0dd
--- /dev/null
+++ b/libc/src/__support/math/cosf16.h
@@ -0,0 +1,107 @@
+//===-- Implementation header for cosf16 *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+#define LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "sincosf16_utils.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace m

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libc

Author: Muhammad Bassiouni (bassiounix)


Changes



---

Patch is 22.19 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/152871.diff


15 Files Affected:

- (modified) libc/shared/math.h (+1) 
- (added) libc/shared/math/cosf16.h (+28) 
- (modified) libc/src/__support/math/CMakeLists.txt (+27) 
- (added) libc/src/__support/math/cosf16.h (+106) 
- (renamed) libc/src/__support/math/sincosf16_utils.h (+5-1) 
- (modified) libc/src/math/generic/CMakeLists.txt (+6-25) 
- (modified) libc/src/math/generic/cosf16.cpp (+2-79) 
- (modified) libc/src/math/generic/cospif16.cpp (+2-1) 
- (modified) libc/src/math/generic/sinf16.cpp (+2-1) 
- (modified) libc/src/math/generic/sinpif16.cpp (+2-1) 
- (modified) libc/src/math/generic/tanf16.cpp (+2-1) 
- (modified) libc/src/math/generic/tanpif16.cpp (+2-1) 
- (modified) libc/test/shared/CMakeLists.txt (+1) 
- (modified) libc/test/shared/shared_math_test.cpp (+1-1) 
- (modified) utils/bazel/llvm-project-overlay/libc/BUILD.bazel (+30-18) 


``diff
diff --git a/libc/shared/math.h b/libc/shared/math.h
index 0c11640101563..a7edb0811a380 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -34,6 +34,7 @@
 #include "math/cbrtf.h"
 #include "math/cos.h"
 #include "math/cosf.h"
+#include "math/cosf16.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cosf16.h b/libc/shared/math/cosf16.h
new file mode 100644
index 0..8a19285c5755b
--- /dev/null
+++ b/libc/shared/math/cosf16.h
@@ -0,0 +1,28 @@
+//===-- Shared cosf16 function --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_COSF16_H
+#define LLVM_LIBC_SHARED_MATH_COSF16_H
+
+#include "shared/libc_common.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "src/__support/math/cosf16.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cosf16;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LIBC_TYPES_HAS_FLOAT16
+
+#endif // LLVM_LIBC_SHARED_MATH_COSF16_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index 2cd064591e976..f4a8ee0fbb41c 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -390,6 +390,23 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cosf16
+  HDRS
+cosf16.h
+  DEPENDS
+.sincosf16_utils
+libc.hdr.errno_macros
+libc.hdr.fenv_macros
+libc.src.__support.FPUtil.cast
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.except_value_utils
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.macros.optimization
+libc.src.__support.macros.properties.types
+)
+
 add_header_library(
   erff
   HDRS
@@ -699,3 +716,13 @@ add_header_library(
 libc.src.__support.FPUtil.polyeval
 libc.src.__support.common
 )
+
+add_header_library(
+  sincosf16_utils
+  HDRS
+sincosf16_utils.h
+  DEPENDS
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.FPUtil.nearest_integer
+libc.src.__support.common
+)
diff --git a/libc/src/__support/math/cosf16.h b/libc/src/__support/math/cosf16.h
new file mode 100644
index 0..50c9a8f765c2a
--- /dev/null
+++ b/libc/src/__support/math/cosf16.h
@@ -0,0 +1,106 @@
+//===-- Implementation header for cosf16 *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+#define LLVM_LIBC_SRC___SUPPORT_MATH_COSF16_H
+
+#include "include/llvm-libc-macros/float16-macros.h"
+
+#ifdef LIBC_TYPES_HAS_FLOAT16
+
+#include "sincosf16_utils.h"
+#include "src/__support/FPUtil/FEnvImpl.h"
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/cast.h"
+#include "src/__support/FPUtil/except_value_utils.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/macros/optimization.h"
+
+namespace LIBC_NAMESPACE_DECL {
+
+namespace math {
+
+LIBC_INLINE static constexpr float16 cosf16(float16 x) {
+#ifndef LIBC_MATH_HAS_SKIP_ACCURATE_PASS
+  constexpr size_t N_EXCEPTS = 4;
+
+  constexpr fputil::ExceptValues COSF16_EXCEPTS{{
+  // (input, RZ output, RU offset, RD offset, RN offset)
+  {0x2b7c, 0x3bfc, 1, 0, 1},
+  {0x4ac1, 0x38b5, 1, 0, 0},
+  {0x5c49, 0xb8c6, 0, 1,

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

bassiounix wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/152871?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#152871** https://app.graphite.dev/github/pr/llvm/llvm-project/152871?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/152871?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#152069** https://app.graphite.dev/github/pr/llvm/llvm-project/152069?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151883** https://app.graphite.dev/github/pr/llvm/llvm-project/151883?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151846** https://app.graphite.dev/github/pr/llvm/llvm-project/151846?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151837** https://app.graphite.dev/github/pr/llvm/llvm-project/151837?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151779** https://app.graphite.dev/github/pr/llvm/llvm-project/151779?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151399** https://app.graphite.dev/github/pr/llvm/llvm-project/151399?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#151012** https://app.graphite.dev/github/pr/llvm/llvm-project/151012?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150993** https://app.graphite.dev/github/pr/llvm/llvm-project/150993?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150968** https://app.graphite.dev/github/pr/llvm/llvm-project/150968?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150868** https://app.graphite.dev/github/pr/llvm/llvm-project/150868?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150854** https://app.graphite.dev/github/pr/llvm/llvm-project/150854?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150852** https://app.graphite.dev/github/pr/llvm/llvm-project/150852?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150849** https://app.graphite.dev/github/pr/llvm/llvm-project/150849?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#150843** https://app.graphite.dev/github/pr/llvm/llvm-project/150843?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/152871
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix ready_for_review 
https://github.com/llvm/llvm-project/pull/152871
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf16 implementation to header-only in src/__support/math folder. (PR #152871)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix edited 
https://github.com/llvm/llvm-project/pull/152871
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoopPeel] Fix branch weights' effect on block frequencies (PR #128785)

2025-08-09 Thread Joel E. Denny via llvm-branch-commits

https://github.com/jdenny-ornl updated 
https://github.com/llvm/llvm-project/pull/128785

>From f4135207e955f6c2e358cad54a7ef6f2f18087f8 Mon Sep 17 00:00:00 2001
From: "Joel E. Denny" 
Date: Wed, 19 Mar 2025 16:19:40 -0400
Subject: [PATCH 1/9] [LoopPeel] Fix branch weights' effect on block
 frequencies

For example:

```
declare void @f(i32)

define void @test(i32 %n) {
entry:
  br label %do.body

do.body:
  %i = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %inc = add i32 %i, 1
  call void @f(i32 %i)
  %c = icmp sge i32 %inc, %n
  br i1 %c, label %do.end, label %do.body, !prof !0

do.end:
  ret void
}

!0 = !{!"branch_weights", i32 1, i32 9}
```

Given those branch weights, once any loop iteration is actually
reached, the probability of the loop exiting at the iteration's end is
1/(1+9).  That is, the loop is likely to exit every 10 iterations and
thus has an estimated trip count of 10.  `opt
-passes='print'` shows that 10 is indeed the frequency of
the loop body:

```
Printing analysis results of BFI for function 'test':
block-frequency-info: test
 - entry: float = 1.0, int = 1801439852625920
 - do.body: float = 10.0, int = 18014398509481984
 - do.end: float = 1.0, int = 1801439852625920
```

Key Observation: The frequency of reaching any particular iteration is
less than for the previous iteration because the previous iteration
has a non-zero probability of exiting the loop.  This observation
holds even though every loop iteration, once actually reached, has
exactly the same probability of exiting and thus exactly the same
branch weights.

Now we use `opt -unroll-force-peel-count=2 -passes=loop-unroll` to
peel 2 iterations and insert them before the remaining loop.  We
expect the key observation above not to change, but it does under the
implementation without this patch.  The block frequency becomes 1.0
for the first iteration, 0.9 for the second, and 6.4 for the main loop
body.  Again, a decreasing frequency is expected, but it decreases too
much: the total frequency of the original loop body becomes 8.3.  The
new branch weights reveal the problem:

```
!0 = !{!"branch_weights", i32 1, i32 9}
!1 = !{!"branch_weights", i32 1, i32 8}
!2 = !{!"branch_weights", i32 1, i32 7}
```

The exit probability is now 1/10 for the first peeled iteration, 1/9
for the second, and 1/8 for the remaining loop iterations.  It seems
this behavior is trying to ensure a decreasing block frequency.
However, as in the key observation above for the original loop, that
happens correctly without decreasing the branch weights across
iterations.

This patch changes the peeling implementation not to decrease the
branch weights across loop iterations so that the frequency for every
iteration is the same as it was in the original loop.  The total
frequency of the loop body, summed across all its occurrences, thus
remains 10 after peeling.

Unfortunately, that change means a later analysis cannot accurately
estimate the trip count of the remaining loop while examining the
remaining loop in isolation without considering the probability of
actually reaching it.  For that purpose, this patch stores the new
trip count as separate metadata named `llvm.loop.estimated_trip_count`
and extends `llvm::getLoopEstimatedTripCount` to prefer it, if
present, over branch weights.

An alternative fix is for `llvm::getLoopEstimatedTripCount` to
subtract the `llvm.loop.peeled.count` metadata from the trip count
estimated by a loop's branch weights.  However, there might be other
loop transformations that still corrupt block frequencies in a similar
manner and require a similar fix.  `llvm.loop.estimated_trip_count` is
intended to provide a general way to store estimated trip counts when
branch weights cannot directly store them.

This patch introduces several fixme comments that need to be addressed
before it can land.
---
 .../include/llvm/Transforms/Utils/LoopUtils.h |  25 ++-
 llvm/lib/Transforms/Utils/LoopPeel.cpp| 145 +++---
 llvm/lib/Transforms/Utils/LoopUtils.cpp   |  20 ++-
 .../LoopUnroll/peel-branch-weights-freq.ll|  75 +
 .../LoopUnroll/peel-branch-weights.ll |  64 
 .../LoopUnroll/peel-loop-pgo-deopt.ll |  11 +-
 .../Transforms/LoopUnroll/peel-loop-pgo.ll|  13 +-
 .../Transforms/LoopVectorize/X86/pr81872.ll   |  18 ++-
 8 files changed, 217 insertions(+), 154 deletions(-)
 create mode 100644 llvm/test/Transforms/LoopUnroll/peel-branch-weights-freq.ll

diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h 
b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 8f4c0c88336ac..82d23a4b68ea1 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -315,7 +315,8 @@ TransformationMode hasLICMVersioningTransformation(const 
Loop *L);
 void addStringMetadataToLoop(Loop *TheLoop, const char *MDString,
  unsigned V = 0);
 
-/// Returns a loop's estimated trip count based on branch weight metadata.
+

[llvm-branch-commits] [llvm] release/21.x: [TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911) (PR #151680)

2025-08-09 Thread Florian Hahn via llvm-branch-commits

fhahn wrote:

> > > Seems like we are still waiting for confirmation on this one?
> > 
> > 
> > My understanding from @mikulas-patocka in #106846 is that there was no 
> > regression on current main after merging over a week ago, so I think we 
> > should be good to go.
> > Not sure what the remaining time-line is, but it would be good to make sure 
> > the regression is fixed in the release.
> 
> There was a regression - the pointless move of %r14 to %r15 and back and 
> loading %rsi from the stack that I mentioned in #106846 is actually a 
> regression that was recently added to git. With older clang-22 from Debian 
> Sid (version 1:22~++20250731080150+be449d6b6587-1~exp1), these pointless 
> instructions are not generated.

Right, but was there a runtime regression? after all, without the patch, the 
Python on ARM64 macos regresses by 2-3% for end-to-end workloads

https://github.com/llvm/llvm-project/pull/151680
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf implementation to header-only in src/__support/math folder. (PR #152069)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix updated 
https://github.com/llvm/llvm-project/pull/152069

>From d6211d7695c4a25e823ff0e87693906af06a20fb Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Tue, 5 Aug 2025 06:30:16 +0300
Subject: [PATCH] [libc][math] Refactor cosf implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cosf.h   |  23 +++
 libc/src/__support/math/CMakeLists.txt|  39 +
 libc/src/__support/math/cosf.h| 152 ++
 .../math}/range_reduction.h   |   6 +-
 .../math}/range_reduction_fma.h   |   6 +-
 .../math}/sincosf_utils.h |   6 +-
 libc/src/math/generic/CMakeLists.txt  |  52 ++
 libc/src/math/generic/cosf.cpp| 133 +--
 libc/src/math/generic/cospif.cpp  |   2 +-
 libc/src/math/generic/sincosf.cpp |   2 +-
 libc/src/math/generic/sinf.cpp|   6 +-
 libc/src/math/generic/sinpif.cpp  |   2 +-
 libc/src/math/generic/tanf.cpp|   2 +-
 libc/src/math/generic/tanpif.cpp  |   2 +-
 .../llvm-project-overlay/libc/BUILD.bazel |  84 +-
 16 files changed, 290 insertions(+), 228 deletions(-)
 create mode 100644 libc/shared/math/cosf.h
 create mode 100644 libc/src/__support/math/cosf.h
 rename libc/src/{math/generic => __support/math}/range_reduction.h (95%)
 rename libc/src/{math/generic => __support/math}/range_reduction_fma.h (95%)
 rename libc/src/{math/generic => __support/math}/sincosf_utils.h (97%)

diff --git a/libc/shared/math.h b/libc/shared/math.h
index a5581ed4272a3..0c11640101563 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -33,6 +33,7 @@
 #include "math/cbrt.h"
 #include "math/cbrtf.h"
 #include "math/cos.h"
+#include "math/cosf.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cosf.h b/libc/shared/math/cosf.h
new file mode 100644
index 0..06182207a82f2
--- /dev/null
+++ b/libc/shared/math/cosf.h
@@ -0,0 +1,23 @@
+//===-- Shared cosf function *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_COSF_H
+#define LLVM_LIBC_SHARED_MATH_COSF_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cosf.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cosf;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SHARED_MATH_COSF_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index bf4db4e09fd0c..2cd064591e976 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -374,6 +374,21 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cosf
+  HDRS
+cosf.h
+  DEPENDS
+.sincosf_utils
+libc.src.errno.errno
+libc.src.__support.FPUtil.basic_operations
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.except_value_utils
+libc.src.__support.FPUtil.fma
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.macros.optimization
+)
 
 add_header_library(
   erff
@@ -649,6 +664,19 @@ add_header_library(
 libc.src.__support.integer_literals
 )
 
+add_header_library(
+  range_reduction
+  HDRS
+range_reduction.h
+range_reduction_fma.h
+  DEPENDS
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.fma
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.FPUtil.nearest_integer
+libc.src.__support.common
+)
+
 add_header_library(
   sincos_eval
   HDRS
@@ -660,3 +688,14 @@ add_header_library(
 libc.src.__support.FPUtil.polyeval
 libc.src.__support.integer_literals
 )
+
+add_header_library(
+  sincosf_utils
+  HDRS
+sincosf_utils.h
+  DEPENDS
+.range_reduction
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.common
+)
diff --git a/libc/src/__support/math/cosf.h b/libc/src/__support/math/cosf.h
new file mode 100644
index 0..074be0b314637
--- /dev/null
+++ b/libc/src/__support/math/cosf.h
@@ -0,0 +1,152 @@
+//===-- Implementation header for cosf --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SU

[llvm-branch-commits] [libc] [llvm] [libc][math] Refactor cosf implementation to header-only in src/__support/math folder. (PR #152069)

2025-08-09 Thread Muhammad Bassiouni via llvm-branch-commits

https://github.com/bassiounix updated 
https://github.com/llvm/llvm-project/pull/152069

>From d6211d7695c4a25e823ff0e87693906af06a20fb Mon Sep 17 00:00:00 2001
From: bassiounix 
Date: Tue, 5 Aug 2025 06:30:16 +0300
Subject: [PATCH] [libc][math] Refactor cosf implementation to header-only in
 src/__support/math folder.

---
 libc/shared/math.h|   1 +
 libc/shared/math/cosf.h   |  23 +++
 libc/src/__support/math/CMakeLists.txt|  39 +
 libc/src/__support/math/cosf.h| 152 ++
 .../math}/range_reduction.h   |   6 +-
 .../math}/range_reduction_fma.h   |   6 +-
 .../math}/sincosf_utils.h |   6 +-
 libc/src/math/generic/CMakeLists.txt  |  52 ++
 libc/src/math/generic/cosf.cpp| 133 +--
 libc/src/math/generic/cospif.cpp  |   2 +-
 libc/src/math/generic/sincosf.cpp |   2 +-
 libc/src/math/generic/sinf.cpp|   6 +-
 libc/src/math/generic/sinpif.cpp  |   2 +-
 libc/src/math/generic/tanf.cpp|   2 +-
 libc/src/math/generic/tanpif.cpp  |   2 +-
 .../llvm-project-overlay/libc/BUILD.bazel |  84 +-
 16 files changed, 290 insertions(+), 228 deletions(-)
 create mode 100644 libc/shared/math/cosf.h
 create mode 100644 libc/src/__support/math/cosf.h
 rename libc/src/{math/generic => __support/math}/range_reduction.h (95%)
 rename libc/src/{math/generic => __support/math}/range_reduction_fma.h (95%)
 rename libc/src/{math/generic => __support/math}/sincosf_utils.h (97%)

diff --git a/libc/shared/math.h b/libc/shared/math.h
index a5581ed4272a3..0c11640101563 100644
--- a/libc/shared/math.h
+++ b/libc/shared/math.h
@@ -33,6 +33,7 @@
 #include "math/cbrt.h"
 #include "math/cbrtf.h"
 #include "math/cos.h"
+#include "math/cosf.h"
 #include "math/erff.h"
 #include "math/exp.h"
 #include "math/exp10.h"
diff --git a/libc/shared/math/cosf.h b/libc/shared/math/cosf.h
new file mode 100644
index 0..06182207a82f2
--- /dev/null
+++ b/libc/shared/math/cosf.h
@@ -0,0 +1,23 @@
+//===-- Shared cosf function *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_LIBC_SHARED_MATH_COSF_H
+#define LLVM_LIBC_SHARED_MATH_COSF_H
+
+#include "shared/libc_common.h"
+#include "src/__support/math/cosf.h"
+
+namespace LIBC_NAMESPACE_DECL {
+namespace shared {
+
+using math::cosf;
+
+} // namespace shared
+} // namespace LIBC_NAMESPACE_DECL
+
+#endif // LLVM_LIBC_SHARED_MATH_COSF_H
diff --git a/libc/src/__support/math/CMakeLists.txt 
b/libc/src/__support/math/CMakeLists.txt
index bf4db4e09fd0c..2cd064591e976 100644
--- a/libc/src/__support/math/CMakeLists.txt
+++ b/libc/src/__support/math/CMakeLists.txt
@@ -374,6 +374,21 @@ add_header_library(
 libc.src.__support.macros.optimization
 )
 
+add_header_library(
+  cosf
+  HDRS
+cosf.h
+  DEPENDS
+.sincosf_utils
+libc.src.errno.errno
+libc.src.__support.FPUtil.basic_operations
+libc.src.__support.FPUtil.fenv_impl
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.except_value_utils
+libc.src.__support.FPUtil.fma
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.macros.optimization
+)
 
 add_header_library(
   erff
@@ -649,6 +664,19 @@ add_header_library(
 libc.src.__support.integer_literals
 )
 
+add_header_library(
+  range_reduction
+  HDRS
+range_reduction.h
+range_reduction_fma.h
+  DEPENDS
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.fma
+libc.src.__support.FPUtil.multiply_add
+libc.src.__support.FPUtil.nearest_integer
+libc.src.__support.common
+)
+
 add_header_library(
   sincos_eval
   HDRS
@@ -660,3 +688,14 @@ add_header_library(
 libc.src.__support.FPUtil.polyeval
 libc.src.__support.integer_literals
 )
+
+add_header_library(
+  sincosf_utils
+  HDRS
+sincosf_utils.h
+  DEPENDS
+.range_reduction
+libc.src.__support.FPUtil.fp_bits
+libc.src.__support.FPUtil.polyeval
+libc.src.__support.common
+)
diff --git a/libc/src/__support/math/cosf.h b/libc/src/__support/math/cosf.h
new file mode 100644
index 0..074be0b314637
--- /dev/null
+++ b/libc/src/__support/math/cosf.h
@@ -0,0 +1,152 @@
+//===-- Implementation header for cosf --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LIBC_SRC___SU

[llvm-branch-commits] [llvm] release/21.x: [TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911) (PR #151680)

2025-08-09 Thread via llvm-branch-commits

mikulas-patocka wrote:

> > > > Seems like we are still waiting for confirmation on this one?
> > > 
> > > 
> > > My understanding from @mikulas-patocka in #106846 is that there was no 
> > > regression on current main after merging over a week ago, so I think we 
> > > should be good to go.
> > > Not sure what the remaining time-line is, but it would be good to make 
> > > sure the regression is fixed in the release.
> > 
> > 
> > There was a regression - the pointless move of %r14 to %r15 and back and 
> > loading %rsi from the stack that I mentioned in #106846 is actually a 
> > regression that was recently added to git. With older clang-22 from Debian 
> > Sid (version 1:22~++20250731080150+be449d6b6587-1~exp1), these pointless 
> > instructions are not generated.
> 
> Right, but was there a runtime regression? after all, without the patch, the 
> Python on ARM64 macos regresses by 2-3% for end-to-end workloads

There was 9% runtime regression on a loop that just adds numbers (so that it 
stresses this code path significantly). There was no measurable regression on 
Ajla self-compilation (the difference is just 0.8% - which may be jitter).

https://github.com/llvm/llvm-project/pull/151680
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Matt Arsenault via llvm-branch-commits


@@ -4515,6 +4515,15 @@ void omp_free(void *ptr, omp_allocator_handle_t 
allocator) {
 }
 /* end of OpenMP 5.1 Memory Management routines */
 
+void *omp_get_dyn_groupprivate_ptr(size_t offset, int *is_fallback,
+   omp_access_t access_group) {
+  if (is_fallback != NULL)

arsenm wrote:

```suggestion
  if (is_fallback != nullptr)
```

Is there a reason to have an optional out argument? Why not just make it 
unconditional or return a pair?

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Matt Arsenault via llvm-branch-commits


@@ -158,6 +158,34 @@ void SharedMemorySmartStackTy::pop(void *Ptr, uint64_t 
Bytes) {
   memory::freeGlobal(Ptr, "Slow path shared memory deallocation");
 }
 
+struct DynCGroupMemTy {
+  void init(KernelLaunchEnvironmentTy *KLE, void *NativeDynCGroup) {
+Size = 0;
+Ptr = nullptr;
+IsFallback = false;

arsenm wrote:

Move to field initializers? 

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Matt Arsenault via llvm-branch-commits


@@ -430,6 +463,17 @@ int omp_get_team_num() { return 
mapping::getBlockIdInKernel(); }
 int omp_get_initial_device(void) { return -1; }
 
 int omp_is_initial_device(void) { return 0; }
+
+void *omp_get_dyn_groupprivate_ptr(size_t Offset, int *IsFallback,
+   omp_access_t) {
+  if (IsFallback != NULL)

arsenm wrote:

```suggestion
  if (IsFallback != nullptr)
```

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [openmp] [OpenMP][Offload] Add offload runtime support for dyn_groupprivate clause (PR #152831)

2025-08-09 Thread Matt Arsenault via llvm-branch-commits


@@ -163,4 +163,8 @@ typedef enum omp_allocator_handle_t {
 
 ///}
 
+enum omp_access_t {

arsenm wrote:

Document, it's particularly weird seeing an enum with only 1 entry 

https://github.com/llvm/llvm-project/pull/152831
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)

2025-08-09 Thread Nikita Popov via llvm-branch-commits


@@ -1678,6 +1680,8 @@ void 
Instruction::dropUnknownNonDebugMetadata(ArrayRef KnownIDs) {
 
   // A DIAssignID attachment is debug metadata, don't drop it.
   KnownSet.insert(LLVMContext::MD_DIAssignID);
+  if (!ProfcheckDisableMetadataFixes)
+KnownSet.insert(LLVMContext::MD_prof);

nikic wrote:

I don't think that inserting it here is correct. This should be handled in the 
caller, like dropUBImplyingAttrsAndMetadata.

https://github.com/llvm/llvm-project/pull/152420
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [ir] MD_prof is not UB-implying (PR #152420)

2025-08-09 Thread Nikita Popov via llvm-branch-commits




nikic wrote:

All this tests needs is a hoistable select with prof metadata. You do not need 
any of the blockaddress / indirectbr stuff.

https://github.com/llvm/llvm-project/pull/152420
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LoongArch] Pre-commit tests for shuffle visiting same lane. NFC (PR #151633)

2025-08-09 Thread via llvm-branch-commits

https://github.com/zhaoqi5 updated 
https://github.com/llvm/llvm-project/pull/151633

>From 8873a1bfc37c643a0e66d7f245b36fbb1f64ad25 Mon Sep 17 00:00:00 2001
From: Qi Zhao 
Date: Fri, 1 Aug 2025 09:47:53 +0800
Subject: [PATCH] [LoongArch] Pre-commit tests for shuffle visiting same lane

---
 .../lasx/shuffle-as-permute-and-shuffle.ll| 113 ++
 1 file changed, 93 insertions(+), 20 deletions(-)

diff --git a/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll 
b/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
index 0e172950340e8..fed085843485a 100644
--- a/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
+++ b/llvm/test/CodeGen/LoongArch/lasx/shuffle-as-permute-and-shuffle.ll
@@ -17,14 +17,25 @@ define <32 x i8> @shuffle_v32i8(<32 x i8> %a) {
   ret <32 x i8> %shuffle
 }
 
+define <32 x i8> @shuffle_v32i8_same_lane(<32 x i8> %a) {
+; CHECK-LABEL: shuffle_v32i8_same_lane:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI1_0)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI1_0)
+; CHECK-NEXT:xvshuf.h $xr1, $xr0, $xr0
+; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:ret
+  %shuffle = shufflevector <32 x i8> %a, <32 x i8> poison, <32 x i32> 
+  ret <32 x i8> %shuffle
+}
 
 define <16 x i16> @shuffle_v16i16(<16 x i16> %a) {
 ; CHECK-LABEL: shuffle_v16i16:
 ; CHECK:   # %bb.0:
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI1_0)
-; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI1_0)
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI1_1)
-; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI1_1)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI2_0)
+; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI2_0)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI2_1)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI2_1)
 ; CHECK-NEXT:xvpermi.d $xr3, $xr0, 78
 ; CHECK-NEXT:xvshuf.d $xr2, $xr0, $xr3
 ; CHECK-NEXT:xvshuf.w $xr1, $xr2, $xr0
@@ -34,13 +45,25 @@ define <16 x i16> @shuffle_v16i16(<16 x i16> %a) {
   ret <16 x i16> %shuffle
 }
 
+define <16 x i16> @shuffle_v16i16_same_lane(<16 x i16> %a) {
+; CHECK-LABEL: shuffle_v16i16_same_lane:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI3_0)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI3_0)
+; CHECK-NEXT:xvshuf.h $xr1, $xr0, $xr0
+; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:ret
+  %shuffle = shufflevector <16 x i16> %a, <16 x i16> poison, <16 x i32> 
+  ret <16 x i16> %shuffle
+}
+
 define <8 x i32> @shuffle_v8i32(<8 x i32> %a) {
 ; CHECK-LABEL: shuffle_v8i32:
 ; CHECK:   # %bb.0:
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI2_0)
-; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI2_0)
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI2_1)
-; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI2_1)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI4_0)
+; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI4_0)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI4_1)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI4_1)
 ; CHECK-NEXT:xvpermi.d $xr3, $xr0, 78
 ; CHECK-NEXT:xvshuf.d $xr2, $xr0, $xr3
 ; CHECK-NEXT:xvshuf.d $xr1, $xr2, $xr0
@@ -50,13 +73,25 @@ define <8 x i32> @shuffle_v8i32(<8 x i32> %a) {
   ret <8 x i32> %shuffle
 }
 
+define <8 x i32> @shuffle_v8i32_same_lane(<8 x i32> %a) {
+; CHECK-LABEL: shuffle_v8i32_same_lane:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI5_0)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI5_0)
+; CHECK-NEXT:xvshuf.d $xr1, $xr0, $xr0
+; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:ret
+  %shuffle = shufflevector <8 x i32> %a, <8 x i32> poison, <8 x i32> 
+  ret <8 x i32> %shuffle
+}
+
 define <4 x i64> @shuffle_v4i64(<4 x i64> %a) {
 ; CHECK-LABEL: shuffle_v4i64:
 ; CHECK:   # %bb.0:
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI3_0)
-; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI3_0)
-; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI3_1)
-; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI3_1)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI6_0)
+; CHECK-NEXT:xvld $xr2, $a0, %pc_lo12(.LCPI6_0)
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI6_1)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI6_1)
 ; CHECK-NEXT:xvpermi.d $xr3, $xr0, 78
 ; CHECK-NEXT:xvshuf.d $xr2, $xr0, $xr3
 ; CHECK-NEXT:xvshuf.d $xr1, $xr2, $xr0
@@ -66,13 +101,25 @@ define <4 x i64> @shuffle_v4i64(<4 x i64> %a) {
   ret <4 x i64> %shuffle
 }
 
+define <4 x i64> @shuffle_v4i64_same_lane(<4 x i64> %a) {
+; CHECK-LABEL: shuffle_v4i64_same_lane:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pcalau12i $a0, %pc_hi20(.LCPI7_0)
+; CHECK-NEXT:xvld $xr1, $a0, %pc_lo12(.LCPI7_0)
+; CHECK-NEXT:xvshuf.d $xr1, $xr0, $xr0
+; CHECK-NEXT:xvori.b $xr0, $xr1, 0
+; CHECK-NEXT:ret
+  %shuffle = shufflevector <4 x i64> %a, <4 x i64> poison, <4 x i32> 
+  ret <4 x i64> %shuffle
+}
+
 define <8 x float> @shuffle_v8f32(<8 x float> %a) {
 ; CHECK-LABEL: shuffle_v8f32:
 ; CHECK:   # %bb.0:
-