[llvm-branch-commits] [llvm] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling (PR #155296)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/155296 >From 421deb7334bf6030686eb809132e1d13b730cbc6 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Mon, 25 Aug 2025 21:04:05 + Subject: [PATCH] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling --- llvm/lib/Transforms/Utils/Local.cpp | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 2cfd70a1746c8..57dc1b38b8ec3 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -3342,8 +3342,11 @@ void llvm::hoistAllInstructionsInto(BasicBlock *DomBlock, Instruction *InsertPt, // retain their original debug locations (DILocations) and debug intrinsic // instructions. // - // Doing so would degrade the debugging experience and adversely affect the - // accuracy of profiling information. + // Doing so would degrade the debugging experience. + // + // FIXME: Issue #152767: debug info should also be the same as the + // original branch, **if** the user explicitly indicated that (for sampling + // PGO) // // Currently, when hoisting the instructions, we take the following actions: // - Remove their debug intrinsic instructions. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Clang][Cygwin] Use correct mangling rule (#158404) (PR #158442)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: None (llvmbot)
Changes
Backport 4abcbb053f8adaf48dbfff677e8ccda1f6d52b33
Requested by: @mstorsjo
---
Full diff: https://github.com/llvm/llvm-project/pull/158442.diff
3 Files Affected:
- (modified) clang/lib/Basic/Targets/X86.h (+2)
- (modified) clang/test/CodeGen/mangle-windows.c (+4-2)
- (modified) clang/test/CodeGenCXX/mangle-windows.cpp (+3)
``diff
diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h
index ebc59c92f4c24..a7be080695ed3 100644
--- a/clang/lib/Basic/Targets/X86.h
+++ b/clang/lib/Basic/Targets/X86.h
@@ -649,6 +649,7 @@ class LLVM_LIBRARY_VISIBILITY CygwinX86_32TargetInfo :
public X86_32TargetInfo {
: X86_32TargetInfo(Triple, Opts) {
this->WCharType = TargetInfo::UnsignedShort;
this->WIntType = TargetInfo::UnsignedInt;
+this->UseMicrosoftManglingForC = true;
DoubleAlign = LongLongAlign = 64;
resetDataLayout("e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-"
"i128:128-f80:32-n8:16:32-a:0:32-S32",
@@ -986,6 +987,7 @@ class LLVM_LIBRARY_VISIBILITY CygwinX86_64TargetInfo :
public X86_64TargetInfo {
: X86_64TargetInfo(Triple, Opts) {
this->WCharType = TargetInfo::UnsignedShort;
this->WIntType = TargetInfo::UnsignedInt;
+this->UseMicrosoftManglingForC = true;
}
void getTargetDefines(const LangOptions &Opts,
diff --git a/clang/test/CodeGen/mangle-windows.c
b/clang/test/CodeGen/mangle-windows.c
index 046b1e8815a8a..e1b06e72a9635 100644
--- a/clang/test/CodeGen/mangle-windows.c
+++ b/clang/test/CodeGen/mangle-windows.c
@@ -1,8 +1,10 @@
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-pc-win32 | FileCheck %s
-// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-cygwin | FileCheck %s
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-pc-windows-msvc-elf |
FileCheck %s --check-prefix=ELF32
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-pc-win32 | FileCheck %s
--check-prefix=X64
-// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-mingw32 | FileCheck %s
--check-prefix=X64
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-mingw32 | FileCheck %s
--check-prefix=X64
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-cygwin | FileCheck %s
--check-prefix=X64
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-pc-windows-msvc-elf |
FileCheck %s --check-prefix=ELF64
// CHECK: target datalayout = "e-m:x-{{.*}}"
diff --git a/clang/test/CodeGenCXX/mangle-windows.cpp
b/clang/test/CodeGenCXX/mangle-windows.cpp
index 3d5a1e9a868ef..737abcf6e3498 100644
--- a/clang/test/CodeGenCXX/mangle-windows.cpp
+++ b/clang/test/CodeGenCXX/mangle-windows.cpp
@@ -4,6 +4,9 @@
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | \
// RUN: FileCheck --check-prefix=ITANIUM %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-cygwin | \
+// RUN: FileCheck --check-prefix=ITANIUM %s
+
void __stdcall f1(void) {}
// WIN: define dso_local x86_stdcallcc void @"?f1@@YGXXZ"
// ITANIUM: define dso_local x86_stdcallcc void @"\01__Z2f1v@0"
``
https://github.com/llvm/llvm-project/pull/158442
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Clang][Cygwin] Use correct mangling rule (#158404) (PR #158442)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/158442
Backport 4abcbb053f8adaf48dbfff677e8ccda1f6d52b33
Requested by: @mstorsjo
>From f4907049285ca0875cc91770e3ceb3f162ec7c48 Mon Sep 17 00:00:00 2001
From: Tomohiro Kashiwada
Date: Sun, 14 Sep 2025 06:40:12 +0900
Subject: [PATCH] [Clang][Cygwin] Use correct mangling rule (#158404)
In
https://github.com/llvm/llvm-project/commit/45ca613c135ea7b5fbc63bff003f20bf20f62081,
whether to mangle names based on calling conventions according to
Microsoft conventions was refactored to a bool in the TargetInfo. Cygwin
targets also require this mangling, but were missed, presumably due to
lack of test coverage of these targets. This commit enables the name
mangling for Cygwin, and also enables test coverage of this mangling on
Cygwin targets.
(cherry picked from commit 4abcbb053f8adaf48dbfff677e8ccda1f6d52b33)
---
clang/lib/Basic/Targets/X86.h| 2 ++
clang/test/CodeGen/mangle-windows.c | 6 --
clang/test/CodeGenCXX/mangle-windows.cpp | 3 +++
3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/clang/lib/Basic/Targets/X86.h b/clang/lib/Basic/Targets/X86.h
index ebc59c92f4c24..a7be080695ed3 100644
--- a/clang/lib/Basic/Targets/X86.h
+++ b/clang/lib/Basic/Targets/X86.h
@@ -649,6 +649,7 @@ class LLVM_LIBRARY_VISIBILITY CygwinX86_32TargetInfo :
public X86_32TargetInfo {
: X86_32TargetInfo(Triple, Opts) {
this->WCharType = TargetInfo::UnsignedShort;
this->WIntType = TargetInfo::UnsignedInt;
+this->UseMicrosoftManglingForC = true;
DoubleAlign = LongLongAlign = 64;
resetDataLayout("e-m:x-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64-"
"i128:128-f80:32-n8:16:32-a:0:32-S32",
@@ -986,6 +987,7 @@ class LLVM_LIBRARY_VISIBILITY CygwinX86_64TargetInfo :
public X86_64TargetInfo {
: X86_64TargetInfo(Triple, Opts) {
this->WCharType = TargetInfo::UnsignedShort;
this->WIntType = TargetInfo::UnsignedInt;
+this->UseMicrosoftManglingForC = true;
}
void getTargetDefines(const LangOptions &Opts,
diff --git a/clang/test/CodeGen/mangle-windows.c
b/clang/test/CodeGen/mangle-windows.c
index 046b1e8815a8a..e1b06e72a9635 100644
--- a/clang/test/CodeGen/mangle-windows.c
+++ b/clang/test/CodeGen/mangle-windows.c
@@ -1,8 +1,10 @@
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-pc-win32 | FileCheck %s
-// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-cygwin | FileCheck %s
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-pc-windows-msvc-elf |
FileCheck %s --check-prefix=ELF32
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-pc-win32 | FileCheck %s
--check-prefix=X64
-// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-mingw32 | FileCheck %s
--check-prefix=X64
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-mingw32 | FileCheck %s
--check-prefix=X64
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-cygwin | FileCheck %s
--check-prefix=X64
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=x86_64-pc-windows-msvc-elf |
FileCheck %s --check-prefix=ELF64
// CHECK: target datalayout = "e-m:x-{{.*}}"
diff --git a/clang/test/CodeGenCXX/mangle-windows.cpp
b/clang/test/CodeGenCXX/mangle-windows.cpp
index 3d5a1e9a868ef..737abcf6e3498 100644
--- a/clang/test/CodeGenCXX/mangle-windows.cpp
+++ b/clang/test/CodeGenCXX/mangle-windows.cpp
@@ -4,6 +4,9 @@
// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-mingw32 | \
// RUN: FileCheck --check-prefix=ITANIUM %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -triple=i386-cygwin | \
+// RUN: FileCheck --check-prefix=ITANIUM %s
+
void __stdcall f1(void) {}
// WIN: define dso_local x86_stdcallcc void @"?f1@@YGXXZ"
// ITANIUM: define dso_local x86_stdcallcc void @"\01__Z2f1v@0"
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Clang][Cygwin] Use correct mangling rule (#158404) (PR #158442)
llvmbot wrote: @jeremyd2019 What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/158442 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Remove TRI argument from reMaterialize (PR #158229)
https://github.com/RKSimon approved this pull request. https://github.com/llvm/llvm-project/pull/158229 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)
https://github.com/kasuga-fj updated
https://github.com/llvm/llvm-project/pull/157086
>From 94b18495719b35a89ee6a18e474e8e92a4429d99 Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga
Date: Fri, 5 Sep 2025 11:41:29 +
Subject: [PATCH] [DA] Add overflow check in ExactSIV
---
llvm/lib/Analysis/DependenceAnalysis.cpp | 14 +-
llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll | 2 +-
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp
b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 0f77a1410e83b..6e576e866b310 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -1170,6 +1170,15 @@ const SCEVConstant
*DependenceInfo::collectConstantUpperBound(const Loop *L,
return nullptr;
}
+/// Returns \p A - \p B if it guaranteed not to signed wrap. Otherwise returns
+/// nullptr. \p A and \p B must have the same integer type.
+static const SCEV *minusSCEVNoSignedOverflow(const SCEV *A, const SCEV *B,
+ ScalarEvolution &SE) {
+ if (SE.willNotOverflow(Instruction::Sub, /*Signed=*/true, A, B))
+return SE.getMinusSCEV(A, B);
+ return nullptr;
+}
+
// testZIV -
// When we have a pair of subscripts of the form [c1] and [c2],
// where c1 and c2 are both loop invariant, we attack it using
@@ -1626,7 +1635,9 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff,
const SCEV *DstCoeff,
assert(0 < Level && Level <= CommonLevels && "Level out of range");
Level--;
Result.Consistent = false;
- const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
+ const SCEV *Delta = minusSCEVNoSignedOverflow(DstConst, SrcConst, *SE);
+ if (!Delta)
+return false;
LLVM_DEBUG(dbgs() << "\tDelta = " << *Delta << "\n");
NewConstraint.setLine(SrcCoeff, SE->getNegativeSCEV(DstCoeff), Delta,
CurLoop);
@@ -1716,6 +1727,7 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff,
const SCEV *DstCoeff,
// explore directions
unsigned NewDirection = Dependence::DVEntry::NONE;
APInt LowerDistance, UpperDistance;
+ // TODO: Overflow check may be needed.
if (TA.sgt(TB)) {
LowerDistance = (TY - TX) + (TA - TB) * TL;
UpperDistance = (TY - TX) + (TA - TB) * TU;
diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index 54bb8b73da02a..fd58568d02c43 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -841,7 +841,7 @@ define void @exact14(ptr %A) {
; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8
0, ptr %idx.0, align 1
; CHECK-SIV-ONLY-NEXT:da analyze - none!
; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8
1, ptr %idx.1, align 1
-; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]!
; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8
1, ptr %idx.1, align 1
; CHECK-SIV-ONLY-NEXT:da analyze - none!
;
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Clang] Rewrite tests using subshells to set env variables (PR #158446)
llvmbot wrote:
@llvm/pr-subscribers-clang-modules
Author: Aiden Grossman (boomanaiden154)
Changes
Now that we have the %readfile substitution, we can rewrite these tests
that were using env variable subshells to write the output of the
command into a file and then load it where it is needed using readfile.
This does involve one invocation of bash so that we are using the system
env binary, which does support redirection into a tool like grep. We
already do this in one LLVM test. I'm not happy about needing that, but
the more principled way to solve it involves reworking how in-process
builtins work within lit. That is something we want to do eventually,
but not something that I think should block this patch.
---
Full diff: https://github.com/llvm/llvm-project/pull/158446.diff
5 Files Affected:
- (modified) clang/test/ClangScanDeps/pr61006.cppm (+2-1)
- (modified) clang/test/ClangScanDeps/resource_directory.c (+4-5)
- (modified) clang/test/Driver/env.c (+3-2)
- (modified) clang/test/Driver/program-path-priority.c (+8-8)
- (modified) clang/test/Modules/relative-resource-dir.m (+3-3)
``diff
diff --git a/clang/test/ClangScanDeps/pr61006.cppm
b/clang/test/ClangScanDeps/pr61006.cppm
index f75edd38c81ba..f10bc1e673987 100644
--- a/clang/test/ClangScanDeps/pr61006.cppm
+++ b/clang/test/ClangScanDeps/pr61006.cppm
@@ -6,7 +6,8 @@
// RUN: mkdir -p %t
// RUN: split-file %s %t
//
-// RUN: EXPECTED_RESOURCE_DIR=`%clang -print-resource-dir` && \
+// RUN: %clang -print-resource-dir | tr -d '\n' > %t/resource-dir
+// RUN: env EXPECTED_RESOURCE_DIR=%{readfile:%t/resource-dir} && \
// RUN: ln -s %clang++ %t/clang++ && \
// RUN: sed "s|EXPECTED_RESOURCE_DIR|$EXPECTED_RESOURCE_DIR|g; s|DIR|%/t|g"
%t/P1689.json.in > %t/P1689.json && \
// RUN: clang-scan-deps -compilation-database %t/P1689.json -format=p1689 |
FileCheck %t/a.cpp -DPREFIX=%/t && \
diff --git a/clang/test/ClangScanDeps/resource_directory.c
b/clang/test/ClangScanDeps/resource_directory.c
index 55d5d90bbcdea..6183e8aefacfa 100644
--- a/clang/test/ClangScanDeps/resource_directory.c
+++ b/clang/test/ClangScanDeps/resource_directory.c
@@ -12,14 +12,14 @@
// then verify `%clang-scan-deps` arrives at the same path by calling the
// `Driver::GetResourcesPath` function.
//
-// RUN: EXPECTED_RESOURCE_DIR=`%clang -print-resource-dir`
+// RUN: %clang -print-resource-dir | tr -d '\n' > %t/resource-dir
// RUN: sed -e "s|CLANG|%clang|g" -e "s|DIR|%/t|g" \
// RUN: %S/Inputs/resource_directory/cdb.json.template > %t/cdb_path.json
//
// RUN: clang-scan-deps -compilation-database %t/cdb_path.json --format
experimental-full \
// RUN: --resource-dir-recipe modify-compiler-path > %t/result_path.json
// RUN: cat %t/result_path.json | sed 's:\?:/:g' \
-// RUN: | FileCheck %s --check-prefix=CHECK-PATH
-DEXPECTED_RESOURCE_DIR="$EXPECTED_RESOURCE_DIR"
+// RUN: | FileCheck %s --check-prefix=CHECK-PATH
-DEXPECTED_RESOURCE_DIR="%{readfile:%t/resource-dir}"
// CHECK-PATH: "-resource-dir"
// CHECK-PATH-NEXT: "[[EXPECTED_RESOURCE_DIR]]"
@@ -31,9 +31,8 @@
// Here we hard-code the expected path into `%t/compiler` and then verify
// `%clang-scan-deps` arrives at the path by actually running the executable.
//
-// RUN: EXPECTED_RESOURCE_DIR="/custom/compiler/resources"
// RUN: echo "#!/bin/sh" > %t/compiler
-// RUN: echo "echo '$EXPECTED_RESOURCE_DIR'" >> %t/compiler
+// RUN: echo "echo '/custom/compiler/resources'" >> %t/compiler
// RUN: chmod +x %t/compiler
// RUN: sed -e "s|CLANG|%/t/compiler|g" -e "s|DIR|%/t|g" \
// RUN: %S/Inputs/resource_directory/cdb.json.template >
%t/cdb_invocation.json
@@ -41,6 +40,6 @@
// RUN: clang-scan-deps -compilation-database %t/cdb_invocation.json --format
experimental-full \
// RUN: --resource-dir-recipe invoke-compiler > %t/result_invocation.json
// RUN: cat %t/result_invocation.json | sed 's:\?:/:g' \
-// RUN: | FileCheck %s --check-prefix=CHECK-PATH
-DEXPECTED_RESOURCE_DIR="$EXPECTED_RESOURCE_DIR"
+// RUN: | FileCheck %s --check-prefix=CHECK-PATH
-DEXPECTED_RESOURCE_DIR="/custom/compiler/resources"
// CHECK-INVOCATION: "-resource-dir"
// CHECK-INVOCATION-NEXT: "[[EXPECTED_RESOURCE_DIR]]"
diff --git a/clang/test/Driver/env.c b/clang/test/Driver/env.c
index b3345ae09ffef..68ded45385702 100644
--- a/clang/test/Driver/env.c
+++ b/clang/test/Driver/env.c
@@ -1,13 +1,14 @@
// Some assertions in this test use Linux style (/) file paths.
// UNSUPPORTED: system-windows
+// RUN: bash -c env | grep LD_LIBRARY_PATH | tr -d '\n' > /tmp/ld_library_path
// The PATH variable is heavily used when trying to find a linker.
-// RUN: env -i LC_ALL=C LD_LIBRARY_PATH="$LD_LIBRARY_PATH"
CLANG_NO_DEFAULT_CONFIG=1 \
+// RUN: env -i LC_ALL=C LD_LIBRARY_PATH="%{readfile:/tmp/ld_library_path}"
CLANG_NO_DEFAULT_CONFIG=1 \
// RUN: %clang %s -### -o %t.o --target=i386-unknown-linux \
// RUN: --sysroot=%S/Inputs/basic_linux_tree \
// RUN: --rtlib=platfo
[llvm-branch-commits] [Clang] Rewrite tests using subshells to set env variables (PR #158446)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/158446 Now that we have the %readfile substitution, we can rewrite these tests that were using env variable subshells to write the output of the command into a file and then load it where it is needed using readfile. This does involve one invocation of bash so that we are using the system env binary, which does support redirection into a tool like grep. We already do this in one LLVM test. I'm not happy about needing that, but the more principled way to solve it involves reworking how in-process builtins work within lit. That is something we want to do eventually, but not something that I think should block this patch. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [llvm] [lit] Make builtin cat work with stdin (PR #158447)
https://github.com/arichardson approved this pull request. https://github.com/llvm/llvm-project/pull/158447 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [RISCV] Support PreserveMost calling convention (#148214) (PR #158403)
llvmbot wrote:
@llvm/pr-subscribers-backend-risc-v
Author: None (llvmbot)
Changes
Backport 058d96f2d68d3496ae52774c06177d4a9039a134
Requested by: @llvmbot
---
Patch is 20.40 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/158403.diff
5 Files Affected:
- (modified) llvm/docs/LangRef.rst (+2)
- (modified) llvm/lib/Target/RISCV/RISCVCallingConv.td (+4)
- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+1)
- (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp (+10-1)
- (added) llvm/test/CodeGen/RISCV/calling-conv-preserve-most.ll (+449)
``diff
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index 8ea850af7a69b..2a9b67b671e11 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -413,6 +413,8 @@ added in the future:
- On AArch64 the callee preserve all general purpose registers, except
X0-X8 and X16-X18. Not allowed with ``nest``.
+- On RISC-V the callee preserve x5-x31 except x6, x7 and x28 registers.
+
The idea behind this convention is to support calls to runtime functions
that have a hot path and a cold path. The hot path is usually a small piece
of code that doesn't use many registers. The cold path might need to call
out to
diff --git a/llvm/lib/Target/RISCV/RISCVCallingConv.td
b/llvm/lib/Target/RISCV/RISCVCallingConv.td
index cbf039edec273..d8c52cbde04c7 100644
--- a/llvm/lib/Target/RISCV/RISCVCallingConv.td
+++ b/llvm/lib/Target/RISCV/RISCVCallingConv.td
@@ -93,3 +93,7 @@ def CSR_XLEN_F32_V_Interrupt_RVE: CalleeSavedRegs<(sub
CSR_XLEN_F32_V_Interrupt,
// Same as CSR_XLEN_F64_V_Interrupt, but excluding X16-X31.
def CSR_XLEN_F64_V_Interrupt_RVE: CalleeSavedRegs<(sub
CSR_XLEN_F64_V_Interrupt,
(sequence "X%u", 16, 31))>;
+
+def CSR_RT_MostRegs : CalleeSavedRegs<(sub CSR_Interrupt, X6, X7, X28)>;
+def CSR_RT_MostRegs_RVE : CalleeSavedRegs<(sub CSR_RT_MostRegs,
+ (sequence "X%u", 16, 31))>;
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 5fb16f5ac6b9e..07a03792c2b23 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -4,6 +4,7 @@ SDValue RISCVTargetLowering::LowerFormalArguments(
case CallingConv::C:
case CallingConv::Fast:
case CallingConv::SPIR_KERNEL:
+ case CallingConv::PreserveMost:
case CallingConv::GRAAL:
case CallingConv::RISCV_VectorCall:
#define CC_VLS_CASE(ABI_VLEN) case CallingConv::RISCV_VLSCall_##ABI_VLEN:
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
index 540412366026b..214536d7f3a74 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
@@ -68,6 +68,9 @@ RISCVRegisterInfo::getCalleeSavedRegs(const MachineFunction
*MF) const {
auto &Subtarget = MF->getSubtarget();
if (MF->getFunction().getCallingConv() == CallingConv::GHC)
return CSR_NoRegs_SaveList;
+ if (MF->getFunction().getCallingConv() == CallingConv::PreserveMost)
+return Subtarget.hasStdExtE() ? CSR_RT_MostRegs_RVE_SaveList
+ : CSR_RT_MostRegs_SaveList;
if (MF->getFunction().hasFnAttribute("interrupt")) {
if (Subtarget.hasVInstructions()) {
if (Subtarget.hasStdExtD())
@@ -811,7 +814,13 @@ RISCVRegisterInfo::getCallPreservedMask(const
MachineFunction & MF,
if (CC == CallingConv::GHC)
return CSR_NoRegs_RegMask;
- switch (Subtarget.getTargetABI()) {
+ RISCVABI::ABI ABI = Subtarget.getTargetABI();
+ if (CC == CallingConv::PreserveMost) {
+if (ABI == RISCVABI::ABI_ILP32E || ABI == RISCVABI::ABI_LP64E)
+ return CSR_RT_MostRegs_RVE_RegMask;
+return CSR_RT_MostRegs_RegMask;
+ }
+ switch (ABI) {
default:
llvm_unreachable("Unrecognized ABI");
case RISCVABI::ABI_ILP32E:
diff --git a/llvm/test/CodeGen/RISCV/calling-conv-preserve-most.ll
b/llvm/test/CodeGen/RISCV/calling-conv-preserve-most.ll
new file mode 100644
index 0..08340bbe0013a
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/calling-conv-preserve-most.ll
@@ -0,0 +1,449 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=riscv32 < %s | FileCheck %s -check-prefix=RV32I
+; RUN: llc -mtriple=riscv64 < %s | FileCheck %s -check-prefix=RV64I
+; RUN: llc -mtriple=riscv32 -mattr=+e -target-abi ilp32e < %s | FileCheck %s
-check-prefix=RV32E
+; RUN: llc -mtriple=riscv64 -mattr=+e -target-abi lp64e < %s | FileCheck %s
-check-prefix=RV64E
+
+; Check the PreserveMost calling convention works.
+
+declare void @standard_cc_func()
+declare preserve_mostcc void @preserve_mostcc_func()
+
+define preserve_mostcc void @preserve_mostcc1() nounwind {
+; RV32I-LABEL: preserve_mostcc1:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT:addi sp, sp, -64
+; RV32I
[llvm-branch-commits] [clang] [llvm] [lit] Make builtin cat work with stdin (PR #158447)
https://github.com/boomanaiden154 edited https://github.com/llvm/llvm-project/pull/158447 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/155993
>From 5d9fe15da36fe4af54c34f09ebb11fca7f8a1ac3 Mon Sep 17 00:00:00 2001
From: ergawy
Date: Fri, 29 Aug 2025 04:04:07 -0500
Subject: [PATCH] [flang][do concurent] Add saxpy offload tests for OpenMP
mapping
Adds end-to-end tests for `do concurrent` offloading to the device.
---
.../fortran/do-concurrent-to-omp-saxpy-2d.f90 | 53 +++
.../fortran/do-concurrent-to-omp-saxpy.f90| 53 +++
2 files changed, 106 insertions(+)
create mode 100644
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
create mode 100644
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 |
%fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n, m
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:,:),intent(in) :: x
+ real(kind=real32), dimension(:,:),intent(inout) :: y
+ integer :: i, j
+
+ do concurrent(i=1:n, j=1:m)
+ y(i,j) = a * x(i,j) + y(i,j)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1,1) ",f8.6)') y(1,1)
+ write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000, m=1
+ real(kind=real32), allocatable, dimension(:,:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n,1:m), y(1:n,1:m))
+ a = 2.0_real32
+ x(:,:) = 1.0_real32
+ y(:,:) = 2.0_real32
+
+ call saxpy(a, x, y, n, m)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1,1) 4.0
+! CHECK: y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 |
%fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:),intent(in) :: x
+ real(kind=real32), dimension(:),intent(inout) :: y
+ integer :: i
+
+ do concurrent(i=1:n)
+ y(i) = a * x(i) + y(i)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1) ",f8.6)') y(1)
+ write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000
+ real(kind=real32), allocatable, dimension(:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n), y(1:n))
+ a = 2.0_real32
+ x(:) = 1.0_real32
+ y(:) = 2.0_real32
+
+ call saxpy(a, x, y, n)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1) 4.0
+! CHECK: y(n) 4.0
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Clang][Cygwin] Use correct mangling rule (#158404) (PR #158442)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/158442 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/150170
>From be85e6c0222fe757ac59959bad5c56a85a32b869 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 25 ++
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 58 +++
clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 378 +++
3 files changed, 461 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e5a1422fe8778..56b1a8dc09b15 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
//===--===//
// R600-NI only builtins.
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 87a46287c4022..07cf08c54985a 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+ switch (BuiltinID) {
+ default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+ }
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
llvm::SyncScope::ID SSID;
switch (BuiltinID) {
+ case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+ case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u
[llvm-branch-commits] [llvm] [Offload][Conformance] Update olMemFree calls in conformance tests (PR #157773)
https://github.com/jhuber6 approved this pull request. https://github.com/llvm/llvm-project/pull/157773 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
@@ -65,6 +65,12 @@ if (MLIR_INCLUDE_INTEGRATION_TESTS)
endif()
+option(MLIR_RUN_STANDALONE_INSTALL_TESTS "Run Standalone example install
tests." ON)
+if(MLIR_RUN_STANDALONE_INSTALL_TESTS AND "${CMAKE_INSTALL_PREFIX}" STREQUAL "")
+ message(WARNING "Standalone example install tests will install into root!\
rengolin wrote:
What do you mean by `root`? The GIT root? `/`? The build directory root?
Usually, when I do automatic installs for CI, I set it to `%build/install`, so
that it's guaranteed to be writable by the build process and in the same
filesystem (ex. the source directory may be NFS and slow).
https://github.com/llvm/llvm-project/pull/157944
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (PR #158272)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/158272 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] release/21.x: [Clang][Cygwin] Use correct mangling rule (#158404) (PR #158442)
https://github.com/jeremyd2019 approved this pull request. LGTM, and this seems to be a regression from 20.x https://github.com/llvm/llvm-project/pull/158442 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)
https://github.com/jplehr edited https://github.com/llvm/llvm-project/pull/157484 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lldb] release/21.x: [lldb][DataFormatter] Allow std::string formatters to match against custom allocators (#156050) (PR #157048)
github-actions[bot] wrote: @Michael137 (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR. https://github.com/llvm/llvm-project/pull/157048 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Use RegClassByHwMode to manage operand VGPR operand constraints (PR #158272)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/158272?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#158272** https://app.graphite.dev/github/pr/llvm/llvm-project/158272?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158272?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#158269** https://app.graphite.dev/github/pr/llvm/llvm-project/158269?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#158271](https://github.com/llvm/llvm-project/pull/158271) https://app.graphite.dev/github/pr/llvm/llvm-project/158271?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/158272 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 4779770 - Revert "Revert "[clang][dataflow] Transfer more cast expressions." (#157148)"
Author: Samira Bakon
Date: 2025-09-08T15:08:52-04:00
New Revision: 4779770aab880315d359005a5cacd4c4976d8649
URL:
https://github.com/llvm/llvm-project/commit/4779770aab880315d359005a5cacd4c4976d8649
DIFF:
https://github.com/llvm/llvm-project/commit/4779770aab880315d359005a5cacd4c4976d8649.diff
LOG: Revert "Revert "[clang][dataflow] Transfer more cast expressions."
(#157148)"
This reverts commit 4c29a600fa34d0c1cabf4ffcf081f2a00b09fddd.
Added:
Modified:
clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
clang/lib/Analysis/FlowSensitive/Transfer.cpp
clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
Removed:
diff --git a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
index 8fcc6a44027a0..534b9a017d8f0 100644
--- a/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
+++ b/clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
@@ -17,6 +17,7 @@
#include "clang/AST/Decl.h"
#include "clang/AST/Type.h"
#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Debug.h"
#include
@@ -152,6 +153,11 @@ class RecordStorageLocation final : public StorageLocation
{
return {SyntheticFields.begin(), SyntheticFields.end()};
}
+ /// Add a synthetic field, if none by that name is already present.
+ void addSyntheticField(llvm::StringRef Name, StorageLocation &Loc) {
+SyntheticFields.insert({Name, &Loc});
+ }
+
/// Changes the child storage location for a field `D` of reference type.
/// All other fields cannot change their storage location and always retain
/// the storage location passed to the `RecordStorageLocation` constructor.
@@ -164,6 +170,11 @@ class RecordStorageLocation final : public StorageLocation
{
Children[&D] = Loc;
}
+ /// Add a child storage location for a field `D`, if not already present.
+ void addChild(const ValueDecl &D, StorageLocation *Loc) {
+Children.insert({&D, Loc});
+ }
+
llvm::iterator_range children() const {
return {Children.begin(), Children.end()};
}
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index 86a816e2e406c..23a6de45e18b1 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -20,14 +20,17 @@
#include "clang/AST/OperationKinds.h"
#include "clang/AST/Stmt.h"
#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
#include "clang/Analysis/FlowSensitive/ASTOps.h"
#include "clang/Analysis/FlowSensitive/AdornedCFG.h"
#include "clang/Analysis/FlowSensitive/DataflowAnalysisContext.h"
#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"
#include "clang/Analysis/FlowSensitive/NoopAnalysis.h"
#include "clang/Analysis/FlowSensitive/RecordOps.h"
+#include "clang/Analysis/FlowSensitive/StorageLocation.h"
#include "clang/Analysis/FlowSensitive/Value.h"
#include "clang/Basic/Builtins.h"
+#include "clang/Basic/LLVM.h"
#include "clang/Basic/OperatorKinds.h"
#include "llvm/Support/Casting.h"
#include
@@ -287,7 +290,7 @@ class TransferVisitor : public
ConstStmtVisitor {
}
}
- void VisitImplicitCastExpr(const ImplicitCastExpr *S) {
+ void VisitCastExpr(const CastExpr *S) {
const Expr *SubExpr = S->getSubExpr();
assert(SubExpr != nullptr);
@@ -317,6 +320,60 @@ class TransferVisitor : public
ConstStmtVisitor {
break;
}
+case CK_BaseToDerived: {
+ // This is a cast of (single-layer) pointer or reference to a record
type.
+ // We should now model the fields for the derived type.
+
+ // Get the RecordStorageLocation for the record object underneath.
+ RecordStorageLocation *Loc = nullptr;
+ if (S->getType()->isPointerType()) {
+auto *PV = Env.get(*SubExpr);
+assert(PV != nullptr);
+if (PV == nullptr)
+ break;
+Loc = cast(&PV->getPointeeLoc());
+ } else {
+assert(S->getType()->isRecordType());
+if (SubExpr->isGLValue()) {
+ Loc = Env.get(*SubExpr);
+} else {
+ Loc = &Env.getResultObjectLocation(*SubExpr);
+}
+ }
+ if (!Loc) {
+// Nowhere to add children or propagate from, so we're done.
+break;
+ }
+
+ // Get the derived record type underneath the reference or pointer.
+ QualType Derived = S->getType().getNonReferenceType();
+ if (Derived->isPointerType()) {
+Derived = Derived->getPointeeType();
+ }
+
+ // Add children to the storage location for fields (including synthetic
+ // fields) of the derived type and initialize their values.
+ for (const FieldDecl *Field :
+ Env.getDataflowAnalysisContext().getModeledFields(Derived)) {
+assert(Field != nullptr);
+QualType FieldType = Field-
[llvm-branch-commits] [llvm] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling (PR #155296)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/155296 >From 2362af9d23a45db4bb85381539630be98703a2c3 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Mon, 25 Aug 2025 21:04:05 + Subject: [PATCH] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling --- llvm/lib/Transforms/Utils/Local.cpp | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 2cfd70a1746c8..57dc1b38b8ec3 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -3342,8 +3342,11 @@ void llvm::hoistAllInstructionsInto(BasicBlock *DomBlock, Instruction *InsertPt, // retain their original debug locations (DILocations) and debug intrinsic // instructions. // - // Doing so would degrade the debugging experience and adversely affect the - // accuracy of profiling information. + // Doing so would degrade the debugging experience. + // + // FIXME: Issue #152767: debug info should also be the same as the + // original branch, **if** the user explicitly indicated that (for sampling + // PGO) // // Currently, when hoisting the instructions, we take the following actions: // - Remove their debug intrinsic instructions. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling (PR #155296)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/155296 >From 2362af9d23a45db4bb85381539630be98703a2c3 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Mon, 25 Aug 2025 21:04:05 + Subject: [PATCH] [NFC] Leave a comment in `Local.cpp` about debug info & sample profiling --- llvm/lib/Transforms/Utils/Local.cpp | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/llvm/lib/Transforms/Utils/Local.cpp b/llvm/lib/Transforms/Utils/Local.cpp index 2cfd70a1746c8..57dc1b38b8ec3 100644 --- a/llvm/lib/Transforms/Utils/Local.cpp +++ b/llvm/lib/Transforms/Utils/Local.cpp @@ -3342,8 +3342,11 @@ void llvm::hoistAllInstructionsInto(BasicBlock *DomBlock, Instruction *InsertPt, // retain their original debug locations (DILocations) and debug intrinsic // instructions. // - // Doing so would degrade the debugging experience and adversely affect the - // accuracy of profiling information. + // Doing so would degrade the debugging experience. + // + // FIXME: Issue #152767: debug info should also be the same as the + // original branch, **if** the user explicitly indicated that (for sampling + // PGO) // // Currently, when hoisting the instructions, we take the following actions: // - Remove their debug intrinsic instructions. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] BitstreamRemarkParser: Refactor error handling (PR #156511)
@@ -13,81 +13,171 @@
#ifndef LLVM_LIB_REMARKS_BITSTREAM_REMARK_PARSER_H
#define LLVM_LIB_REMARKS_BITSTREAM_REMARK_PARSER_H
-#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Bitstream/BitstreamReader.h"
#include "llvm/Remarks/BitstreamRemarkContainer.h"
+#include "llvm/Remarks/Remark.h"
#include "llvm/Remarks/RemarkFormat.h"
#include "llvm/Remarks/RemarkParser.h"
+#include "llvm/Remarks/RemarkStringTable.h"
#include "llvm/Support/Error.h"
-#include
+#include "llvm/Support/FormatVariadic.h"
#include
#include
#include
namespace llvm {
namespace remarks {
-struct Remark;
+class BitstreamBlockParserHelperBase {
+protected:
+ BitstreamCursor &Stream;
+
+ unsigned BlockID;
+ StringRef BlockName;
+
+public:
+ BitstreamBlockParserHelperBase(BitstreamCursor &Stream, unsigned BlockID,
+ StringRef BlockName)
+ : Stream(Stream), BlockID(BlockID), BlockName(BlockName) {}
jroelofs wrote:
to go with the other re-ordering:
```suggestion
: Stream(Stream), BlockName(BlockName), BlockID(BlockID) {}
```
https://github.com/llvm/llvm-project/pull/156511
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/156610
>From 31bf2c1e204b8ad977f1416c5e91aacfc0faaf80 Mon Sep 17 00:00:00 2001
From: ergawy
Date: Tue, 2 Sep 2025 08:36:34 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device
Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
---
.../OpenMP/DoConcurrentConversion.cpp | 117 ++
.../DoConcurrent/reduce_device.mlir | 53
2 files changed, 121 insertions(+), 49 deletions(-)
create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d00a4fdd2cf2e..6e308499100fa 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
for (mlir::Value local : loop.getLocalVars())
liveIns.push_back(local);
+
+ for (mlir::Value reduce : loop.getReduceVars())
+liveIns.push_back(reduce);
}
/// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -319,7 +322,7 @@ class DoConcurrentConversion
targetOp =
genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns,
targetClauseOps, loopNestClauseOps, liveInShapeInfoMap);
- genTeamsOp(doLoop.getLoc(), rewriter);
+ genTeamsOp(rewriter, loop, mapper);
}
mlir::omp::ParallelOp parallelOp =
@@ -492,46 +495,7 @@ class DoConcurrentConversion
if (!mapToDevice)
genPrivatizers(rewriter, mapper, loop, wsloopClauseOps);
-if (!loop.getReduceVars().empty()) {
- for (auto [op, byRef, sym, arg] : llvm::zip_equal(
- loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
- loop.getReduceSymsAttr().getAsRange(),
- loop.getRegionReduceArgs())) {
-auto firReducer = moduleSymbolTable.lookup(
-sym.getLeafReference());
-
-mlir::OpBuilder::InsertionGuard guard(rewriter);
-rewriter.setInsertionPointAfter(firReducer);
-std::string ompReducerName = sym.getLeafReference().str() + ".omp";
-
-auto ompReducer =
-moduleSymbolTable.lookup(
-rewriter.getStringAttr(ompReducerName));
-
-if (!ompReducer) {
- ompReducer = mlir::omp::DeclareReductionOp::create(
- rewriter, firReducer.getLoc(), ompReducerName,
- firReducer.getTypeAttr().getValue());
-
- cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(),
- ompReducer.getAllocRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(),
- ompReducer.getInitializerRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(),
- ompReducer.getReductionRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(),
- ompReducer.getAtomicReductionRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(),
- ompReducer.getCleanupRegion());
- moduleSymbolTable.insert(ompReducer);
-}
-
-wsloopClauseOps.reductionVars.push_back(op);
-wsloopClauseOps.reductionByref.push_back(byRef);
-wsloopClauseOps.reductionSyms.push_back(
-mlir::SymbolRefAttr::get(ompReducer));
- }
-}
+genReductions(rewriter, mapper, loop, wsloopClauseOps);
auto wsloopOp =
mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps);
@@ -553,8 +517,6 @@ class DoConcurrentConversion
rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back());
mlir::omp::YieldOp::create(rewriter, loop->getLoc());
-loop->getParentOfType().print(
-llvm::errs(), mlir::OpPrintingFlags().assumeVerified());
return {loopNestOp, wsloopOp};
}
@@ -778,15 +740,26 @@ class DoConcurrentConversion
liveInName, shape);
}
- mlir::omp::TeamsOp
- genTeamsOp(mlir::Location loc,
- mlir::ConversionPatternRewriter &rewriter) const {
-auto teamsOp = rewriter.create(
-loc, /*clauses=*/mlir::omp::TeamsOperands{});
+ mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter,
+fir::DoConcurrentLoopOp loop,
+mlir::IRMapping &mapper) const {
+mlir::omp::TeamsOperands teamsOps;
+genReductions(rewriter, mapper, loop, teamsOps);
+
+mlir::Location loc = loop.getLoc();
+aut
[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
https://github.com/svkeerthy edited https://github.com/llvm/llvm-project/pull/158376 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
makslevental wrote: Note, this PR depends on https://github.com/llvm/llvm-project/pull/158090 because currently (HEAD) the `monolithic` CI scripts do not install `FileCheck`, `count`, and `not` targets. https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
ojhunt wrote: > > I have checked in with @ahmedbougacha and his feeling is that this is fine > > as it requires a bunch of work to opt in, and for places where the security > > is important enough that we don't want people using this it's easy enough > > to block. > > Thanks for checking. as above I misunderstood what Ahmed was saying, and also the wording was terrible: the opinion on disabling and similar was mine - the concerns there were mine and I was trying to say I felt my concerns had been addressed. https://github.com/llvm/llvm-project/pull/133537 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] CodeGen: Emit .prefalign directives based on the prefalign attribute. (PR #155529)
pcc wrote: Ping https://github.com/llvm/llvm-project/pull/155529 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/makslevental updated
https://github.com/llvm/llvm-project/pull/157944
>From 888cca06e4c81b1b12c85ec0ac48408e53ad57bd Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 12:57:54 -0700
Subject: [PATCH 01/10] [MLIR][Standalone] test Standalone against install
distributions
---
mlir/test/Examples/standalone/lit.local.cfg | 2 ++
.../Examples/standalone/test.toy.install-dir | 16
mlir/test/lit.cfg.py | 3 +++
mlir/test/lit.site.cfg.py.in | 1 +
4 files changed, 22 insertions(+)
create mode 100644 mlir/test/Examples/standalone/test.toy.install-dir
diff --git a/mlir/test/Examples/standalone/lit.local.cfg
b/mlir/test/Examples/standalone/lit.local.cfg
index fe8397c6b9a10..bc9928decf527 100644
--- a/mlir/test/Examples/standalone/lit.local.cfg
+++ b/mlir/test/Examples/standalone/lit.local.cfg
@@ -10,3 +10,5 @@ config.substitutions.append(("%host_cc", config.host_cc))
config.substitutions.append(("%enable_libcxx", config.enable_libcxx))
config.substitutions.append(("%mlir_cmake_dir", config.mlir_cmake_dir))
config.substitutions.append(("%llvm_use_linker", config.llvm_use_linker))
+config.substitutions.append(("%llvm_obj_root", config.llvm_obj_root))
+config.substitutions.append(("%host_cmake_install_prefix",
config.host_cmake_install_prefix))
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.toy.install-dir
new file mode 100644
index 0..5c33a70491ae1
--- /dev/null
+++ b/mlir/test/Examples/standalone/test.toy.install-dir
@@ -0,0 +1,16 @@
+# REQUIRES: github-actions
+# RUN: "%cmake_exe" --build %llvm_obj_root --target install
+# RUN: "%cmake_exe" "%mlir_src_root/examples/standalone" -G "%cmake_generator"
\
+# RUN: -DCMAKE_CXX_COMPILER=%host_cxx -DCMAKE_C_COMPILER=%host_cc \
+# RUN: -DLLVM_ENABLE_LIBCXX=%enable_libcxx
-DMLIR_DIR=%host_cmake_install_prefix \
+# RUN: -DLLVM_USE_LINKER=%llvm_use_linker \
+# RUN: -DPython3_EXECUTABLE=%python \
+# RUN: -DPython_EXECUTABLE=%python
+# RUN: "%cmake_exe" --build . --target check-standalone | tee %t
+# RUN: FileCheck --input-file=%t %s
+
+# Note: The number of checked tests is not important. The command will fail
+# if any fail.
+# CHECK: Passed
+# CHECK-NOT: Failed
+# UNSUPPORTED: target={{.*(windows|android).*}}
diff --git a/mlir/test/lit.cfg.py b/mlir/test/lit.cfg.py
index f99c24d6e299a..08c7947c1e9a6 100644
--- a/mlir/test/lit.cfg.py
+++ b/mlir/test/lit.cfg.py
@@ -383,3 +383,6 @@ def have_host_jit_feature_support(feature_name):
if sys.version_info >= (3, 11):
config.available_features.add("python-ge-311")
+
+if "GITHUB_ACTIONS" in os.environ:
+config.available_features.add("github-actions")
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 8a742a227847b..7e22ebf23c773 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -18,6 +18,7 @@ config.host_cxx = "@HOST_CXX@"
config.enable_libcxx = "@LLVM_ENABLE_LIBCXX@"
config.host_cmake = "@CMAKE_COMMAND@"
config.host_cmake_generator = "@CMAKE_GENERATOR@"
+config.host_cmake_install_prefix = "@CMAKE_INSTALL_PREFIX@"
config.llvm_use_linker = "@LLVM_USE_LINKER@"
config.llvm_use_sanitizer = "@LLVM_USE_SANITIZER@"
config.host_arch = "@HOST_ARCH@"
>From f26de0615a7e62b55bfa4dd0eee2ea423a1175f1 Mon Sep 17 00:00:00 2001
From: Maksim Levental
Date: Wed, 10 Sep 2025 13:23:07 -0700
Subject: [PATCH 02/10] Update lit.site.cfg.py.in
---
.../standalone/{test.toy.install-dir => test.install-dir.toy}| 0
mlir/test/lit.site.cfg.py.in | 1 +
2 files changed, 1 insertion(+)
rename mlir/test/Examples/standalone/{test.toy.install-dir =>
test.install-dir.toy} (100%)
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.install-dir.toy
similarity index 100%
rename from mlir/test/Examples/standalone/test.toy.install-dir
rename to mlir/test/Examples/standalone/test.install-dir.toy
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 7e22ebf23c773..eadfd047d15f7 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -3,6 +3,7 @@
import sys
config.target_triple = "@LLVM_TARGET_TRIPLE@"
+config.llvm_obj_root = "@LLVM_BINARY_DIR@"
config.llvm_src_root = "@LLVM_SOURCE_DIR@"
config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@")
config.lit_tools_dir = "@LLVM_LIT_TOOLS_DIR@"
>From b682c08d8682f4775b050d92a9082b943f42988b Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 15:54:54 -0700
Subject: [PATCH 03/10] add test.install-distribution-dir.toy
---
mlir/test/Examples/standalone/lit.local.cfg | 1 +
.../Examples/standalone/test.install-dir.toy| 4 ++--
.../test.install-distribution-dir.toy | 17 +
3 files changed, 20 insertions(+), 2 deletions(-)
create mode 100644
mlir/test/Examples/stand
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/makslevental updated
https://github.com/llvm/llvm-project/pull/157944
>From 888cca06e4c81b1b12c85ec0ac48408e53ad57bd Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 12:57:54 -0700
Subject: [PATCH 01/10] [MLIR][Standalone] test Standalone against install
distributions
---
mlir/test/Examples/standalone/lit.local.cfg | 2 ++
.../Examples/standalone/test.toy.install-dir | 16
mlir/test/lit.cfg.py | 3 +++
mlir/test/lit.site.cfg.py.in | 1 +
4 files changed, 22 insertions(+)
create mode 100644 mlir/test/Examples/standalone/test.toy.install-dir
diff --git a/mlir/test/Examples/standalone/lit.local.cfg
b/mlir/test/Examples/standalone/lit.local.cfg
index fe8397c6b9a10..bc9928decf527 100644
--- a/mlir/test/Examples/standalone/lit.local.cfg
+++ b/mlir/test/Examples/standalone/lit.local.cfg
@@ -10,3 +10,5 @@ config.substitutions.append(("%host_cc", config.host_cc))
config.substitutions.append(("%enable_libcxx", config.enable_libcxx))
config.substitutions.append(("%mlir_cmake_dir", config.mlir_cmake_dir))
config.substitutions.append(("%llvm_use_linker", config.llvm_use_linker))
+config.substitutions.append(("%llvm_obj_root", config.llvm_obj_root))
+config.substitutions.append(("%host_cmake_install_prefix",
config.host_cmake_install_prefix))
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.toy.install-dir
new file mode 100644
index 0..5c33a70491ae1
--- /dev/null
+++ b/mlir/test/Examples/standalone/test.toy.install-dir
@@ -0,0 +1,16 @@
+# REQUIRES: github-actions
+# RUN: "%cmake_exe" --build %llvm_obj_root --target install
+# RUN: "%cmake_exe" "%mlir_src_root/examples/standalone" -G "%cmake_generator"
\
+# RUN: -DCMAKE_CXX_COMPILER=%host_cxx -DCMAKE_C_COMPILER=%host_cc \
+# RUN: -DLLVM_ENABLE_LIBCXX=%enable_libcxx
-DMLIR_DIR=%host_cmake_install_prefix \
+# RUN: -DLLVM_USE_LINKER=%llvm_use_linker \
+# RUN: -DPython3_EXECUTABLE=%python \
+# RUN: -DPython_EXECUTABLE=%python
+# RUN: "%cmake_exe" --build . --target check-standalone | tee %t
+# RUN: FileCheck --input-file=%t %s
+
+# Note: The number of checked tests is not important. The command will fail
+# if any fail.
+# CHECK: Passed
+# CHECK-NOT: Failed
+# UNSUPPORTED: target={{.*(windows|android).*}}
diff --git a/mlir/test/lit.cfg.py b/mlir/test/lit.cfg.py
index f99c24d6e299a..08c7947c1e9a6 100644
--- a/mlir/test/lit.cfg.py
+++ b/mlir/test/lit.cfg.py
@@ -383,3 +383,6 @@ def have_host_jit_feature_support(feature_name):
if sys.version_info >= (3, 11):
config.available_features.add("python-ge-311")
+
+if "GITHUB_ACTIONS" in os.environ:
+config.available_features.add("github-actions")
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 8a742a227847b..7e22ebf23c773 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -18,6 +18,7 @@ config.host_cxx = "@HOST_CXX@"
config.enable_libcxx = "@LLVM_ENABLE_LIBCXX@"
config.host_cmake = "@CMAKE_COMMAND@"
config.host_cmake_generator = "@CMAKE_GENERATOR@"
+config.host_cmake_install_prefix = "@CMAKE_INSTALL_PREFIX@"
config.llvm_use_linker = "@LLVM_USE_LINKER@"
config.llvm_use_sanitizer = "@LLVM_USE_SANITIZER@"
config.host_arch = "@HOST_ARCH@"
>From f26de0615a7e62b55bfa4dd0eee2ea423a1175f1 Mon Sep 17 00:00:00 2001
From: Maksim Levental
Date: Wed, 10 Sep 2025 13:23:07 -0700
Subject: [PATCH 02/10] Update lit.site.cfg.py.in
---
.../standalone/{test.toy.install-dir => test.install-dir.toy}| 0
mlir/test/lit.site.cfg.py.in | 1 +
2 files changed, 1 insertion(+)
rename mlir/test/Examples/standalone/{test.toy.install-dir =>
test.install-dir.toy} (100%)
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.install-dir.toy
similarity index 100%
rename from mlir/test/Examples/standalone/test.toy.install-dir
rename to mlir/test/Examples/standalone/test.install-dir.toy
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 7e22ebf23c773..eadfd047d15f7 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -3,6 +3,7 @@
import sys
config.target_triple = "@LLVM_TARGET_TRIPLE@"
+config.llvm_obj_root = "@LLVM_BINARY_DIR@"
config.llvm_src_root = "@LLVM_SOURCE_DIR@"
config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@")
config.lit_tools_dir = "@LLVM_LIT_TOOLS_DIR@"
>From b682c08d8682f4775b050d92a9082b943f42988b Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 15:54:54 -0700
Subject: [PATCH 03/10] add test.install-distribution-dir.toy
---
mlir/test/Examples/standalone/lit.local.cfg | 1 +
.../Examples/standalone/test.install-dir.toy| 4 ++--
.../test.install-distribution-dir.toy | 17 +
3 files changed, 20 insertions(+), 2 deletions(-)
create mode 100644
mlir/test/Examples/stand
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/makslevental edited https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
@@ -1046,7 +1046,8 @@ class ConstantPtrAuth final : public Constant {
public:
/// Return a pointer signed with the specified parameters.
LLVM_ABI static ConstantPtrAuth *get(Constant *Ptr, ConstantInt *Key,
- ConstantInt *Disc, Constant *AddrDisc);
+ ConstantInt *Disc, Constant *AddrDisc,
+ Constant *DeactivationSymbol);
ahmedbougacha wrote:
You don't have to do this here, but we probably should make the optional
operands (in textual IR) optional here as well, and implicitly make them null?
Now that I think about it, I'm not sure how idiomatic that would be
https://github.com/llvm/llvm-project/pull/133537
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
ahmedbougacha wrote: Yep, this does seem reasonable to me as well (with a question in-line). Thanks for the summons, sorry I haven't had the chance to take a look before! https://github.com/llvm/llvm-project/pull/133537 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
makslevental wrote: Think about the `install-cxx-test-suite-prefix` thing some more I feel like this is basically the same thing that I'm doing here - it runs CMake in a subprocess and excecutes the install script against the same build directory. The only difference is that it resets `-DCMAKE_INSTALL_PREFIX=...` without overwriting the user's prefix and because it's a `add_custom_target` it plays nicely with dependencies. Other than that it's the same thing. https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/makslevental updated
https://github.com/llvm/llvm-project/pull/157944
>From 888cca06e4c81b1b12c85ec0ac48408e53ad57bd Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 12:57:54 -0700
Subject: [PATCH 01/10] [MLIR][Standalone] test Standalone against install
distributions
---
mlir/test/Examples/standalone/lit.local.cfg | 2 ++
.../Examples/standalone/test.toy.install-dir | 16
mlir/test/lit.cfg.py | 3 +++
mlir/test/lit.site.cfg.py.in | 1 +
4 files changed, 22 insertions(+)
create mode 100644 mlir/test/Examples/standalone/test.toy.install-dir
diff --git a/mlir/test/Examples/standalone/lit.local.cfg
b/mlir/test/Examples/standalone/lit.local.cfg
index fe8397c6b9a10..bc9928decf527 100644
--- a/mlir/test/Examples/standalone/lit.local.cfg
+++ b/mlir/test/Examples/standalone/lit.local.cfg
@@ -10,3 +10,5 @@ config.substitutions.append(("%host_cc", config.host_cc))
config.substitutions.append(("%enable_libcxx", config.enable_libcxx))
config.substitutions.append(("%mlir_cmake_dir", config.mlir_cmake_dir))
config.substitutions.append(("%llvm_use_linker", config.llvm_use_linker))
+config.substitutions.append(("%llvm_obj_root", config.llvm_obj_root))
+config.substitutions.append(("%host_cmake_install_prefix",
config.host_cmake_install_prefix))
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.toy.install-dir
new file mode 100644
index 0..5c33a70491ae1
--- /dev/null
+++ b/mlir/test/Examples/standalone/test.toy.install-dir
@@ -0,0 +1,16 @@
+# REQUIRES: github-actions
+# RUN: "%cmake_exe" --build %llvm_obj_root --target install
+# RUN: "%cmake_exe" "%mlir_src_root/examples/standalone" -G "%cmake_generator"
\
+# RUN: -DCMAKE_CXX_COMPILER=%host_cxx -DCMAKE_C_COMPILER=%host_cc \
+# RUN: -DLLVM_ENABLE_LIBCXX=%enable_libcxx
-DMLIR_DIR=%host_cmake_install_prefix \
+# RUN: -DLLVM_USE_LINKER=%llvm_use_linker \
+# RUN: -DPython3_EXECUTABLE=%python \
+# RUN: -DPython_EXECUTABLE=%python
+# RUN: "%cmake_exe" --build . --target check-standalone | tee %t
+# RUN: FileCheck --input-file=%t %s
+
+# Note: The number of checked tests is not important. The command will fail
+# if any fail.
+# CHECK: Passed
+# CHECK-NOT: Failed
+# UNSUPPORTED: target={{.*(windows|android).*}}
diff --git a/mlir/test/lit.cfg.py b/mlir/test/lit.cfg.py
index f99c24d6e299a..08c7947c1e9a6 100644
--- a/mlir/test/lit.cfg.py
+++ b/mlir/test/lit.cfg.py
@@ -383,3 +383,6 @@ def have_host_jit_feature_support(feature_name):
if sys.version_info >= (3, 11):
config.available_features.add("python-ge-311")
+
+if "GITHUB_ACTIONS" in os.environ:
+config.available_features.add("github-actions")
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 8a742a227847b..7e22ebf23c773 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -18,6 +18,7 @@ config.host_cxx = "@HOST_CXX@"
config.enable_libcxx = "@LLVM_ENABLE_LIBCXX@"
config.host_cmake = "@CMAKE_COMMAND@"
config.host_cmake_generator = "@CMAKE_GENERATOR@"
+config.host_cmake_install_prefix = "@CMAKE_INSTALL_PREFIX@"
config.llvm_use_linker = "@LLVM_USE_LINKER@"
config.llvm_use_sanitizer = "@LLVM_USE_SANITIZER@"
config.host_arch = "@HOST_ARCH@"
>From f26de0615a7e62b55bfa4dd0eee2ea423a1175f1 Mon Sep 17 00:00:00 2001
From: Maksim Levental
Date: Wed, 10 Sep 2025 13:23:07 -0700
Subject: [PATCH 02/10] Update lit.site.cfg.py.in
---
.../standalone/{test.toy.install-dir => test.install-dir.toy}| 0
mlir/test/lit.site.cfg.py.in | 1 +
2 files changed, 1 insertion(+)
rename mlir/test/Examples/standalone/{test.toy.install-dir =>
test.install-dir.toy} (100%)
diff --git a/mlir/test/Examples/standalone/test.toy.install-dir
b/mlir/test/Examples/standalone/test.install-dir.toy
similarity index 100%
rename from mlir/test/Examples/standalone/test.toy.install-dir
rename to mlir/test/Examples/standalone/test.install-dir.toy
diff --git a/mlir/test/lit.site.cfg.py.in b/mlir/test/lit.site.cfg.py.in
index 7e22ebf23c773..eadfd047d15f7 100644
--- a/mlir/test/lit.site.cfg.py.in
+++ b/mlir/test/lit.site.cfg.py.in
@@ -3,6 +3,7 @@
import sys
config.target_triple = "@LLVM_TARGET_TRIPLE@"
+config.llvm_obj_root = "@LLVM_BINARY_DIR@"
config.llvm_src_root = "@LLVM_SOURCE_DIR@"
config.llvm_tools_dir = lit_config.substitute("@LLVM_TOOLS_DIR@")
config.lit_tools_dir = "@LLVM_LIT_TOOLS_DIR@"
>From b682c08d8682f4775b050d92a9082b943f42988b Mon Sep 17 00:00:00 2001
From: makslevental
Date: Wed, 10 Sep 2025 15:54:54 -0700
Subject: [PATCH 03/10] add test.install-distribution-dir.toy
---
mlir/test/Examples/standalone/lit.local.cfg | 1 +
.../Examples/standalone/test.install-dir.toy| 4 ++--
.../test.install-distribution-dir.toy | 17 +
3 files changed, 20 insertions(+), 2 deletions(-)
create mode 100644
mlir/test/Examples/stand
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/rengolin edited https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][CodeGe][CFI] Pre-commit transparent_union tests (PR #158192)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: Vitaly Buka (vitalybuka)
Changes
---
Full diff: https://github.com/llvm/llvm-project/pull/158192.diff
4 Files Affected:
- (modified) clang/test/CodeGen/cfi-icall-generalize.c (+15)
- (modified) clang/test/CodeGen/cfi-icall-normalize2.c (+13)
- (modified) clang/test/CodeGen/kcfi-generalize.c (+15)
- (modified) clang/test/CodeGen/kcfi-normalize.c (+13)
``diff
diff --git a/clang/test/CodeGen/cfi-icall-generalize.c
b/clang/test/CodeGen/cfi-icall-generalize.c
index 0af17e5760cc6..116a99e4e2859 100644
--- a/clang/test/CodeGen/cfi-icall-generalize.c
+++ b/clang/test/CodeGen/cfi-icall-generalize.c
@@ -15,5 +15,20 @@ void g(int** (*fp)(const char *, const char **)) {
fp(0, 0);
}
+union Union {
+ char *c;
+ long* n;
+} __attribute__((transparent_union));
+
+// CHECK: define{{.*}} void @uni({{.*}} !type [[TYPE2:![0-9]+]] !type
[[TYPE2_GENERALIZED:![0-9]+]]
+void uni(void (*fn)(union Union), union Union arg1) {
+ // UNGENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata
!"_ZTSFv5UnionE")
+ // GENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata
!"_ZTSFv5UnionE.generalized")
+fn(arg1);
+}
+
// CHECK: [[TYPE]] = !{i64 0, !"_ZTSFPPiPKcPS2_E"}
// CHECK: [[TYPE_GENERALIZED]] = !{i64 0, !"_ZTSFPvPKvS_E.generalized"}
+
+// CHECK: [[TYPE2]] = !{i64 0, !"_ZTSFvPFv5UnionES_E"}
+// CHECK: [[TYPE2_GENERALIZED]] = !{i64 0, !"_ZTSFvPv5UnionE.generalized"}
diff --git a/clang/test/CodeGen/cfi-icall-normalize2.c
b/clang/test/CodeGen/cfi-icall-normalize2.c
index 93893065cf903..c88ecc9f0c3f7 100644
--- a/clang/test/CodeGen/cfi-icall-normalize2.c
+++ b/clang/test/CodeGen/cfi-icall-normalize2.c
@@ -24,6 +24,19 @@ void baz(void (*fn)(int, int, int), int arg1, int arg2, int
arg3) {
fn(arg1, arg2, arg3);
}
+union Union {
+ char *c;
+ long* n;
+} __attribute__((transparent_union));
+
+void uni(void (*fn)(union Union), union Union arg1) {
+// CHECK-LABEL: define{{.*}}uni
+// CHECK-SAME: {{.*}}!type ![[TYPE4:[0-9]+]] !type !{{[0-9]+}}
+// CHECK: call i1 @llvm.type.test({{i8\*|ptr}} {{%f|%0}}, metadata
!"_ZTSFv5UnionE.normalized")
+fn(arg1);
+}
+
// CHECK: ![[TYPE1]] = !{i64 0, !"_ZTSFvPFvu3i32ES_E.normalized"}
// CHECK: ![[TYPE2]] = !{i64 0, !"_ZTSFvPFvu3i32S_ES_S_E.normalized"}
// CHECK: ![[TYPE3]] = !{i64 0, !"_ZTSFvPFvu3i32S_S_ES_S_S_E.normalized"}
+// CHECK: ![[TYPE4]] = !{i64 0, !"_ZTSFvPFv5UnionES_E.normalized"}
diff --git a/clang/test/CodeGen/kcfi-generalize.c
b/clang/test/CodeGen/kcfi-generalize.c
index 4e32f4f35057c..89b298f3e2faa 100644
--- a/clang/test/CodeGen/kcfi-generalize.c
+++ b/clang/test/CodeGen/kcfi-generalize.c
@@ -26,8 +26,23 @@ void g(int** (*fp)(const char *, const char **)) {
fp(0, 0);
}
+union Union {
+ char *c;
+ long* n;
+} __attribute__((transparent_union));
+
+// CHECK: define{{.*}} void @uni({{.*}} !kcfi_type [[TYPE2:![0-9]+]]
+void uni(void (*fn)(union Union), union Union arg1) {
+ // UNGENERALIZED: call {{.*}} [ "kcfi"(i32 -1037059548) ]
+ // GENERALIZED: call {{.*}} [ "kcfi"(i32 422130955) ]
+fn(arg1);
+}
+
// UNGENERALIZED: [[TYPE]] = !{i32 1296635908}
// GENERALIZED: [[TYPE]] = !{i32 -49168686}
// UNGENERALIZED: [[TYPE3]] = !{i32 874141567}
// GENERALIZED: [[TYPE3]] = !{i32 954385378}
+
+// UNGENERALIZED: [[TYPE2]] = !{i32 981319178}
+// GENERALIZED: [[TYPE2]] = !{i32 -1599950473}
\ No newline at end of file
diff --git a/clang/test/CodeGen/kcfi-normalize.c
b/clang/test/CodeGen/kcfi-normalize.c
index b9150e88f6ab5..cde784962d11a 100644
--- a/clang/test/CodeGen/kcfi-normalize.c
+++ b/clang/test/CodeGen/kcfi-normalize.c
@@ -28,7 +28,20 @@ void baz(void (*fn)(int, int, int), int arg1, int arg2, int
arg3) {
fn(arg1, arg2, arg3);
}
+union Union {
+ char *c;
+ long* n;
+} __attribute__((transparent_union));
+
+void uni(void (*fn)(union Union), union Union arg1) {
+// CHECK-LABEL: define{{.*}}uni
+// CHECK-SAME: {{.*}}!kcfi_type ![[TYPE4:[0-9]+]]
+// CHECK: call void %0(ptr %1) [ "kcfi"(i32 -1430221633) ]
+fn(arg1);
+}
+
// CHECK: ![[#]] = !{i32 4, !"cfi-normalize-integers", i32 1}
// CHECK: ![[TYPE1]] = !{i32 -1143117868}
// CHECK: ![[TYPE2]] = !{i32 -460921415}
// CHECK: ![[TYPE3]] = !{i32 -333839615}
+// CHECK: ![[TYPE4]] = !{i32 1766237188}
\ No newline at end of file
``
https://github.com/llvm/llvm-project/pull/158192
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
https://github.com/makslevental ready_for_review https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU: Move spill pseudo special case out of adjustAllocatableRegClass (PR #158246)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
Changes
This is special for the same reason av_mov_b64_imm_pseudo is special.
---
Full diff: https://github.com/llvm/llvm-project/pull/158246.diff
2 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+3-5)
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+4-2)
``diff
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 5c3340703ba3b..b1a61886802f4 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -5976,8 +5976,7 @@ SIInstrInfo::getWholeWaveFunctionSetup(MachineFunction
&MF) const {
static const TargetRegisterClass *
adjustAllocatableRegClass(const GCNSubtarget &ST, const SIRegisterInfo &RI,
const MCInstrDesc &TID, unsigned RCID) {
- if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore()) &&
-!(TID.TSFlags & SIInstrFlags::Spill {
+ if (!ST.hasGFX90AInsts() && (((TID.mayLoad() || TID.mayStore() {
switch (RCID) {
case AMDGPU::AV_32RegClassID:
RCID = AMDGPU::VGPR_32RegClassID;
@@ -6012,10 +6011,9 @@ const TargetRegisterClass
*SIInstrInfo::getRegClass(const MCInstrDesc &TID,
if (OpNum >= TID.getNumOperands())
return nullptr;
auto RegClass = TID.operands()[OpNum].RegClass;
- if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO) {
-// Special pseudos have no alignment requirement
+ // Special pseudos have no alignment requirement
+ if (TID.getOpcode() == AMDGPU::AV_MOV_B64_IMM_PSEUDO || isSpill(TID))
return RI.getRegClass(RegClass);
- }
return adjustAllocatableRegClass(ST, RI, TID, RegClass);
}
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index f7dde2b90b68e..e0373e7768435 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -797,10 +797,12 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
return get(Opcode).TSFlags & SIInstrFlags::Spill;
}
- static bool isSpill(const MachineInstr &MI) {
-return MI.getDesc().TSFlags & SIInstrFlags::Spill;
+ static bool isSpill(const MCInstrDesc &Desc) {
+return Desc.TSFlags & SIInstrFlags::Spill;
}
+ static bool isSpill(const MachineInstr &MI) { return isSpill(MI.getDesc()); }
+
static bool isWWMRegSpillOpcode(uint16_t Opcode) {
return Opcode == AMDGPU::SI_SPILL_WWM_V32_SAVE ||
Opcode == AMDGPU::SI_SPILL_WWM_AV32_SAVE ||
``
https://github.com/llvm/llvm-project/pull/158246
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)
https://github.com/heiher updated
https://github.com/llvm/llvm-project/pull/157819
>From 9e2fb6040167b52d7bf4fbd4e8ab444de3099d74 Mon Sep 17 00:00:00 2001
From: WANG Rui
Date: Wed, 10 Sep 2025 17:11:10 +0800
Subject: [PATCH] [clang][LoongArch] Introduce LASX and LSX conversion
intrinsics
This patch introduces the LASX and LSX conversion intrinsics:
- __m256 __lasx_cast_128_s (__m128)
- __m256d __lasx_cast_128_d (__m128d)
- __m256i __lasx_cast_128 (__m128i)
- __m256 __lasx_concat_128_s (__m128, __m128)
- __m256d __lasx_concat_128_d (__m128, __m128d)
- __m256i __lasx_concat_128 (__m128, __m128i)
- __m128 __lasx_extract_128_lo_s (__m256)
- __m128d __lasx_extract_128_lo_d (__m256d)
- __m128i __lasx_extract_128_lo (__m256i)
- __m128 __lasx_extract_128_hi_s (__m256)
- __m128d __lasx_extract_128_hi_d (__m256d)
- __m128i __lasx_extract_128_hi (__m256i)
- __m256 __lasx_insert_128_lo_s (__m256, __m128)
- __m256d __lasx_insert_128_lo_d (__m256d, __m128d)
- __m256i __lasx_insert_128_lo (__m256i, __m128i)
- __m256 __lasx_insert_128_hi_s (__m256, __m128)
- __m256d __lasx_insert_128_hi_d (__m256d, __m128d)
- __m256i __lasx_insert_128_hi (__m256i, __m128i)
---
.../clang/Basic/BuiltinsLoongArchLASX.def | 19 +++
clang/lib/Headers/lasxintrin.h| 110
.../CodeGen/LoongArch/lasx/builtin-alias.c| 153 +
clang/test/CodeGen/LoongArch/lasx/builtin.c | 157 ++
4 files changed, 439 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
index c4ea46a3bc5b5..b234dedad648e 100644
--- a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
+++ b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
@@ -986,3 +986,22 @@ TARGET_BUILTIN(__builtin_lasx_xbnz_b, "iV32Uc", "nc",
"lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_h, "iV16Us", "nc", "lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_w, "iV8Ui", "nc", "lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_d, "iV4ULLi", "nc", "lasx")
+
+TARGET_BUILTIN(__builtin_lasx_cast_128_s, "V8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128_d, "V4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128, "V32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_s, "V8fV4fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_d, "V4dV2dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128, "V32ScV16ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo, "V32ScV32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi, "V32ScV32ScV16Sc", "nc", "lasx")
diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h
index 85020d82829e2..417671ffd437d 100644
--- a/clang/lib/Headers/lasxintrin.h
+++ b/clang/lib/Headers/lasxintrin.h
@@ -10,6 +10,8 @@
#ifndef _LOONGSON_ASXINTRIN_H
#define _LOONGSON_ASXINTRIN_H 1
+#include
+
#if defined(__loongarch_asx)
typedef signed char v32i8 __attribute__((vector_size(32), aligned(32)));
@@ -3882,5 +3884,113 @@ extern __inline
#define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1)))
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__,
+ __artificial__)) __m256 __lasx_cast_128_s(__m128 _1) {
+ return (__m256)__builtin_lasx_cast_128_s((v4f32)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256d
+__lasx_cast_128_d(__m128d _1) {
+ return (__m256d)__builtin_lasx_cast_128_d((v2f64)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256i
+__lasx_cast_128(__m128i _1) {
+ return (__m256i)__builtin_lasx_cast_128((v16i8)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256
+__lasx_concat_128_s(__m128 _1, __m128 _2) {
+ return (__m256)__builtin_lasx_concat_128_s((v4f32)_1, (v4f32)_2);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256d
+__lasx_concat_128_d(__m128d _1, __m128d _2) {
+ return (__m256d)__builtin_lasx_concat_128_d((v2f64)_1, (v2f64)_2);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __
[llvm-branch-commits] [llvm] CodeGen: Keep reference to TargetRegisterInfo in TargetInstrInfo (PR #158224)
llvmbot wrote:
@llvm/pr-subscribers-backend-msp430
Author: Matt Arsenault (arsenm)
Changes
Both conceptually belong to the same subtarget, so it should not
be necessary to pass in the context TargetRegisterInfo to any
TargetInstrInfo member. Add this reference so those superfluous
arguments can be removed.
Most targets placed their TargetRegisterInfo as a member
in TargetInstrInfo. A few had this owned by the TargetSubtargetInfo,
so unify all targets to look the same.
---
Patch is 45.06 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/158224.diff
50 Files Affected:
- (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+8-3)
- (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+27-41)
- (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/AMDGPU/R600InstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+2-1)
- (modified) llvm/lib/Target/ARC/ARCInstrInfo.cpp (+2-1)
- (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp (+3-2)
- (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.h (+7-2)
- (modified) llvm/lib/Target/ARM/ARMInstrInfo.cpp (+2-1)
- (modified) llvm/lib/Target/ARM/ARMInstrInfo.h (+1-1)
- (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.h (+1-1)
- (modified) llvm/lib/Target/ARM/Thumb2InstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/ARM/Thumb2InstrInfo.h (+1-1)
- (modified) llvm/lib/Target/AVR/AVRInstrInfo.cpp (+2-2)
- (modified) llvm/lib/Target/BPF/BPFInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/CSKY/CSKYInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/DirectX/DirectXInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp (+2-2)
- (modified) llvm/lib/Target/Hexagon/HexagonInstrInfo.h (+5)
- (modified) llvm/lib/Target/Hexagon/HexagonSubtarget.cpp (+1-2)
- (modified) llvm/lib/Target/Hexagon/HexagonSubtarget.h (+1-2)
- (modified) llvm/lib/Target/Lanai/LanaiInstrInfo.cpp (+2-1)
- (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp (+2-2)
- (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.h (+4)
- (modified) llvm/lib/Target/LoongArch/LoongArchSubtarget.cpp (+1-1)
- (modified) llvm/lib/Target/LoongArch/LoongArchSubtarget.h (+1-2)
- (modified) llvm/lib/Target/MSP430/MSP430InstrInfo.cpp (+2-1)
- (modified) llvm/lib/Target/Mips/Mips16InstrInfo.cpp (+1-5)
- (modified) llvm/lib/Target/Mips/Mips16InstrInfo.h (+1-1)
- (modified) llvm/lib/Target/Mips/MipsInstrInfo.cpp (+3-2)
- (modified) llvm/lib/Target/Mips/MipsInstrInfo.h (+6-2)
- (modified) llvm/lib/Target/Mips/MipsSEInstrInfo.cpp (+1-5)
- (modified) llvm/lib/Target/Mips/MipsSEInstrInfo.h (+1-1)
- (modified) llvm/lib/Target/NVPTX/NVPTXInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+3-2)
- (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+3)
- (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+1-1)
- (modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+1-2)
- (modified) llvm/lib/Target/SPIRV/SPIRVInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/Sparc/SparcInstrInfo.cpp (+2-2)
- (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/VE/VEInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/XCore/XCoreInstrInfo.cpp (+1-1)
- (modified) llvm/lib/Target/Xtensa/XtensaInstrInfo.cpp (+2-1)
- (modified) llvm/unittests/CodeGen/MFCommon.inc (+3-1)
- (modified) llvm/utils/TableGen/InstrInfoEmitter.cpp (+7-5)
``diff
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 6a624a7052cdd..802cca6022074 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -113,9 +113,12 @@ struct ExtAddrMode {
///
class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
protected:
- TargetInstrInfo(unsigned CFSetupOpcode = ~0u, unsigned CFDestroyOpcode = ~0u,
- unsigned CatchRetOpcode = ~0u, unsigned ReturnOpcode = ~0u)
- : CallFrameSetupOpcode(CFSetupOpcode),
+ const TargetRegisterInfo &TRI;
+
+ TargetInstrInfo(const TargetRegisterInfo &TRI, unsigned CFSetupOpcode = ~0u,
+ unsigned CFDestroyOpcode = ~0u, unsigned CatchRetOpcode =
~0u,
+ unsigned ReturnOpcode = ~0u)
+ : TRI(TRI), CallFrameSetupOpcode(CFSetupOpcode),
CallFrameDestroyOpcode(CFDestroyOpcode),
CatchRetOpcode(CatchRetOpcode),
ReturnOpcode(ReturnOpcode) {}
@@ -124,6 +127,8 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
TargetInstrInfo &operator=(const TargetInstrInfo &) = delete;
virtual ~TargetInstrInfo();
+ const TargetRegisterInfo &getRegisterInfo() const { return TRI; }
+
static bool isG
[llvm-branch-commits] [llvm] CodeGen: Remove TRI arguments from stack load/store hooks (PR #158240)
llvmbot wrote:
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-msp430
Author: Matt Arsenault (arsenm)
Changes
This is directly available in TargetInstrInfo
---
Patch is 110.41 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/158240.diff
63 Files Affected:
- (modified) llvm/include/llvm/CodeGen/TargetInstrInfo.h (+2-4)
- (modified) llvm/lib/CodeGen/FixupStatepointCallerSaved.cpp (+3-3)
- (modified) llvm/lib/CodeGen/InlineSpiller.cpp (+4-4)
- (modified) llvm/lib/CodeGen/RegAllocFast.cpp (+3-4)
- (modified) llvm/lib/CodeGen/RegisterScavenging.cpp (+2-2)
- (modified) llvm/lib/CodeGen/TargetFrameLoweringImpl.cpp (+2-3)
- (modified) llvm/lib/CodeGen/TargetInstrInfo.cpp (+2-2)
- (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+2-3)
- (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (+12-12)
- (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.h (+2-3)
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.cpp (+3-5)
- (modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+2-4)
- (modified) llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp (+4-5)
- (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp (+10-7)
- (modified) llvm/lib/Target/ARM/ARMBaseInstrInfo.h (+2-3)
- (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/ARM/Thumb1InstrInfo.h (+2-3)
- (modified) llvm/lib/Target/ARM/Thumb2InstrInfo.cpp (+8-8)
- (modified) llvm/lib/Target/ARM/Thumb2InstrInfo.h (+2-3)
- (modified) llvm/lib/Target/AVR/AVRInstrInfo.cpp (+5-7)
- (modified) llvm/lib/Target/AVR/AVRInstrInfo.h (+2-4)
- (modified) llvm/lib/Target/BPF/BPFInstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/BPF/BPFInstrInfo.h (+2-3)
- (modified) llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp (+4-7)
- (modified) llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/Hexagon/HexagonInstrInfo.h (+2-3)
- (modified) llvm/lib/Target/Lanai/LanaiInstrInfo.cpp (+2-4)
- (modified) llvm/lib/Target/Lanai/LanaiInstrInfo.h (+2-4)
- (modified) llvm/lib/Target/LoongArch/LoongArchFrameLowering.cpp (+1-1)
- (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.cpp (+8-8)
- (modified) llvm/lib/Target/LoongArch/LoongArchInstrInfo.h (+2-4)
- (modified) llvm/lib/Target/MSP430/MSP430InstrInfo.cpp (+7-6)
- (modified) llvm/lib/Target/MSP430/MSP430InstrInfo.h (+2-4)
- (modified) llvm/lib/Target/Mips/Mips16InstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/Mips/Mips16InstrInfo.h (+3-2)
- (modified) llvm/lib/Target/Mips/MipsInstrInfo.h (+6-9)
- (modified) llvm/lib/Target/Mips/MipsSEFrameLowering.cpp (+20-29)
- (modified) llvm/lib/Target/Mips/MipsSEInstrInfo.cpp (+20-19)
- (modified) llvm/lib/Target/Mips/MipsSEInstrInfo.h (+2-3)
- (modified) llvm/lib/Target/PowerPC/PPCFrameLowering.cpp (+5-6)
- (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.cpp (+11-12)
- (modified) llvm/lib/Target/PowerPC/PPCInstrInfo.h (+6-6)
- (modified) llvm/lib/Target/RISCV/RISCVFrameLowering.cpp (+13-15)
- (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+4-6)
- (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+14-13)
- (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+3-3)
- (modified) llvm/lib/Target/Sparc/SparcInstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/Sparc/SparcInstrInfo.h (+2-3)
- (modified) llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp (+8-8)
- (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp (+8-6)
- (modified) llvm/lib/Target/SystemZ/SystemZInstrInfo.h (+4-2)
- (modified) llvm/lib/Target/VE/VEInstrInfo.cpp (+6-5)
- (modified) llvm/lib/Target/VE/VEInstrInfo.h (+4-2)
- (modified) llvm/lib/Target/X86/X86FastPreTileConfig.cpp (+1-2)
- (modified) llvm/lib/Target/X86/X86FrameLowering.cpp (+3-4)
- (modified) llvm/lib/Target/X86/X86InstrInfo.cpp (+13-11)
- (modified) llvm/lib/Target/X86/X86InstrInfo.h (+3-3)
- (modified) llvm/lib/Target/XCore/XCoreFrameLowering.cpp (+2-3)
- (modified) llvm/lib/Target/XCore/XCoreInstrInfo.cpp (+2-3)
- (modified) llvm/lib/Target/XCore/XCoreInstrInfo.h (+4-2)
- (modified) llvm/lib/Target/Xtensa/XtensaFrameLowering.cpp (+1-1)
- (modified) llvm/lib/Target/Xtensa/XtensaInstrInfo.cpp (+10-8)
- (modified) llvm/lib/Target/Xtensa/XtensaInstrInfo.h (+2-3)
``diff
diff --git a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
index 802cca6022074..fb7ced7960846 100644
--- a/llvm/include/llvm/CodeGen/TargetInstrInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetInstrInfo.h
@@ -1165,8 +1165,7 @@ class LLVM_ABI TargetInstrInfo : public MCInstrInfo {
/// register spill instruction, part of prologue, during the frame lowering.
virtual void storeRegToStackSlot(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, Register SrcReg,
- bool isKill, int FrameIndex, const TargetRegisterClass *RC,
- const TargetRegisterInfo *TRI, Register VReg,
+ bool isKill, int FrameIndex, const TargetRegisterClass *RC,
[llvm-branch-commits] [NFC][CFI][CodeGen] Move GeneralizeFunctionType out of CreateMetadataIdentifierGeneralized (PR #158190)
llvmbot wrote:
@llvm/pr-subscribers-clang-codegen
Author: Vitaly Buka (vitalybuka)
Changes
---
Full diff: https://github.com/llvm/llvm-project/pull/158190.diff
1 Files Affected:
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+9-5)
``diff
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp
b/clang/lib/CodeGen/CodeGenModule.cpp
index d45fb823d4c35..acd77c5aca89c 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -3041,9 +3041,11 @@ void
CodeGenModule::createFunctionTypeMetadataForIcall(const FunctionDecl *FD,
if (isa(FD) && !cast(FD)->isStatic())
return;
- llvm::Metadata *MD = CreateMetadataIdentifierForType(FD->getType());
+ QualType FnType = FD->getType();
+ llvm::Metadata *MD = CreateMetadataIdentifierForType(FnType);
F->addTypeMetadata(0, MD);
- F->addTypeMetadata(0, CreateMetadataIdentifierGeneralized(FD->getType()));
+ FnType = GeneralizeFunctionType(getContext(), FnType);
+ F->addTypeMetadata(0, CreateMetadataIdentifierGeneralized(FnType));
// Emit a hash-based bit set entry for cross-DSO calls.
if (CodeGenOpts.SanitizeCfiCrossDso)
@@ -7936,8 +7938,10 @@ CodeGenModule::CreateMetadataIdentifierImpl(QualType T,
MetadataTypeMap &Map,
llvm::Metadata *CodeGenModule::CreateMetadataIdentifierForFnType(QualType T) {
assert(isa(T));
- if (getCodeGenOpts().SanitizeCfiICallGeneralizePointers)
+ if (getCodeGenOpts().SanitizeCfiICallGeneralizePointers) {
+T = GeneralizeFunctionType(getContext(), T);
return CreateMetadataIdentifierGeneralized(T);
+ }
return CreateMetadataIdentifierForType(T);
}
@@ -7951,8 +7955,8 @@
CodeGenModule::CreateMetadataIdentifierForVirtualMemPtrType(QualType T) {
}
llvm::Metadata *CodeGenModule::CreateMetadataIdentifierGeneralized(QualType T)
{
- return CreateMetadataIdentifierImpl(GeneralizeFunctionType(getContext(), T),
- GeneralizedMetadataIdMap,
".generalized");
+ return CreateMetadataIdentifierImpl(T, GeneralizedMetadataIdMap,
+ ".generalized");
}
/// Returns whether this module needs the "all-vtables" type identifier.
``
https://github.com/llvm/llvm-project/pull/158190
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [CodeGen][CFI] Generalize transparent union parameters (PR #158193)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: Vitaly Buka (vitalybuka)
Changes
According GCC documentation transparent union
calling convention is the same as the type of the
first member of the union.
C++ ignores attribute.
---
Full diff: https://github.com/llvm/llvm-project/pull/158193.diff
5 Files Affected:
- (modified) clang/lib/CodeGen/CodeGenModule.cpp (+17-1)
- (modified) clang/test/CodeGen/cfi-icall-generalize.c (+4-4)
- (modified) clang/test/CodeGen/cfi-icall-normalize2.c (+2-2)
- (modified) clang/test/CodeGen/kcfi-generalize.c (+4-4)
- (modified) clang/test/CodeGen/kcfi-normalize.c (+6-4)
``diff
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp
b/clang/lib/CodeGen/CodeGenModule.cpp
index c647003ff389d..46dbd85665e5d 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -2339,13 +2339,29 @@ llvm::ConstantInt
*CodeGenModule::CreateCrossDsoCfiTypeId(llvm::Metadata *MD) {
return llvm::ConstantInt::get(Int64Ty, llvm::MD5Hash(MDS->getString()));
}
+static QualType GeneralizeTransparentUnion(QualType Ty) {
+ const RecordType *UT = Ty->getAsUnionType();
+ if (!UT)
+return Ty;
+ const RecordDecl *UD = UT->getOriginalDecl()->getDefinitionOrSelf();
+ if (!UD->hasAttr())
+return Ty;
+ for (const auto *it : UD->fields()) {
+return it->getType();
+ }
+ return Ty;
+}
+
+static QualType GeneralizeTransparentUnion(QualType Ty) {
+}
+
// Generalize pointer types to a void pointer with the qualifiers of the
// originally pointed-to type, e.g. 'const char *' and 'char * const *'
// generalize to 'const void *' while 'char *' and 'const char **' generalize
to
// 'void *'.
static QualType GeneralizeType(ASTContext &Ctx, QualType Ty,
bool GeneralizePointers) {
- // TODO: Add other generalizations.
+ Ty = GeneralizeTransparentUnion(Ty);
if (!GeneralizePointers || !Ty->isPointerType())
return Ty;
diff --git a/clang/test/CodeGen/cfi-icall-generalize.c
b/clang/test/CodeGen/cfi-icall-generalize.c
index 116a99e4e2859..5359134805198 100644
--- a/clang/test/CodeGen/cfi-icall-generalize.c
+++ b/clang/test/CodeGen/cfi-icall-generalize.c
@@ -22,13 +22,13 @@ union Union {
// CHECK: define{{.*}} void @uni({{.*}} !type [[TYPE2:![0-9]+]] !type
[[TYPE2_GENERALIZED:![0-9]+]]
void uni(void (*fn)(union Union), union Union arg1) {
- // UNGENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata
!"_ZTSFv5UnionE")
- // GENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata
!"_ZTSFv5UnionE.generalized")
+ // UNGENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata !"_ZTSFvPcE")
+ // GENERALIZED: call i1 @llvm.type.test(ptr {{.*}}, metadata
!"_ZTSFvPvE.generalized")
fn(arg1);
}
// CHECK: [[TYPE]] = !{i64 0, !"_ZTSFPPiPKcPS2_E"}
// CHECK: [[TYPE_GENERALIZED]] = !{i64 0, !"_ZTSFPvPKvS_E.generalized"}
-// CHECK: [[TYPE2]] = !{i64 0, !"_ZTSFvPFv5UnionES_E"}
-// CHECK: [[TYPE2_GENERALIZED]] = !{i64 0, !"_ZTSFvPv5UnionE.generalized"}
+// CHECK: [[TYPE2]] = !{i64 0, !"_ZTSFvPFv5UnionEPcE"}
+// CHECK: [[TYPE2_GENERALIZED]] = !{i64 0, !"_ZTSFvPvS_E.generalized"}
diff --git a/clang/test/CodeGen/cfi-icall-normalize2.c
b/clang/test/CodeGen/cfi-icall-normalize2.c
index c88ecc9f0c3f7..b9d9af7c8a47b 100644
--- a/clang/test/CodeGen/cfi-icall-normalize2.c
+++ b/clang/test/CodeGen/cfi-icall-normalize2.c
@@ -32,11 +32,11 @@ union Union {
void uni(void (*fn)(union Union), union Union arg1) {
// CHECK-LABEL: define{{.*}}uni
// CHECK-SAME: {{.*}}!type ![[TYPE4:[0-9]+]] !type !{{[0-9]+}}
-// CHECK: call i1 @llvm.type.test({{i8\*|ptr}} {{%f|%0}}, metadata
!"_ZTSFv5UnionE.normalized")
+// CHECK: call i1 @llvm.type.test({{i8\*|ptr}} {{%f|%0}}, metadata
!"_ZTSFvPu2i8E.normalized")
fn(arg1);
}
// CHECK: ![[TYPE1]] = !{i64 0, !"_ZTSFvPFvu3i32ES_E.normalized"}
// CHECK: ![[TYPE2]] = !{i64 0, !"_ZTSFvPFvu3i32S_ES_S_E.normalized"}
// CHECK: ![[TYPE3]] = !{i64 0, !"_ZTSFvPFvu3i32S_S_ES_S_S_E.normalized"}
-// CHECK: ![[TYPE4]] = !{i64 0, !"_ZTSFvPFv5UnionES_E.normalized"}
+// CHECK: ![[TYPE4]] = !{i64 0, !"_ZTSFvPFv5UnionEPu2i8E.normalized"}
diff --git a/clang/test/CodeGen/kcfi-generalize.c
b/clang/test/CodeGen/kcfi-generalize.c
index 89b298f3e2faa..24e054549d527 100644
--- a/clang/test/CodeGen/kcfi-generalize.c
+++ b/clang/test/CodeGen/kcfi-generalize.c
@@ -33,8 +33,8 @@ union Union {
// CHECK: define{{.*}} void @uni({{.*}} !kcfi_type [[TYPE2:![0-9]+]]
void uni(void (*fn)(union Union), union Union arg1) {
- // UNGENERALIZED: call {{.*}} [ "kcfi"(i32 -1037059548) ]
- // GENERALIZED: call {{.*}} [ "kcfi"(i32 422130955) ]
+ // UNGENERALIZED: call {{.*}} [ "kcfi"(i32 -587217045) ]
+ // GENERALIZED: call {{.*}} [ "kcfi"(i32 2139530422) ]
fn(arg1);
}
@@ -44,5 +44,5 @@ void uni(void (*fn)(union Union), union Union arg1) {
// UNGENERALIZED: [[TYPE3]] = !{i32 874141567}
// GENERALIZED: [[TYPE3]] = !{i32 954385378}
-// UNGENERALIZED: [[TYPE2]] = !{i32 98
[llvm-branch-commits] [CodeGen][CFI] Generalize transparent union in args of args of functions (PR #158194)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/158194 According GCC documentation transparent union calling convention is the same as the type of the first member of the union. C++ ignores attribute. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Clang] Invoke shell script with bash (PR #157608)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157608 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Remove MachineFunction argument from getRegClass (PR #158188)
https://github.com/s-barannikov approved this pull request. https://github.com/llvm/llvm-project/pull/158188 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][CodeGe][CFI] Pre-commit transparent_union tests (PR #158192)
https://github.com/vitalybuka created https://github.com/llvm/llvm-project/pull/158192 None ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] CodeGen: Remove TRI argument from getRegClass (PR #158225)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/158225 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] Restructure bitstream remarks to be fully standalone (PR #156715)
@@ -232,43 +221,40 @@ void BitstreamRemarkSerializerHelper::setupBlockInfo() {
}
jroelofs wrote:
if you push this into each `case`, and replace `break;` with `return;`, then
this could become a fully covered-switch-with-unreachable pattern.
https://github.com/llvm/llvm-project/pull/156715
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] 111fd8e - Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194) (#158260)"
Author: Mehdi Amini
Date: 2025-09-12T11:37:23+01:00
New Revision: 111fd8e494728ba5eef6ffac50c6bf7a3d967f21
URL:
https://github.com/llvm/llvm-project/commit/111fd8e494728ba5eef6ffac50c6bf7a3d967f21
DIFF:
https://github.com/llvm/llvm-project/commit/111fd8e494728ba5eef6ffac50c6bf7a3d967f21.diff
LOG: Revert "Introduce LDBG_OS() macro as a variant of LDBG() (#157194)
(#158260)"
This reverts commit 8457e68b6b59f8daf5fb747fe3a2f9c48c3c3ba8.
Added:
Modified:
llvm/include/llvm/Support/Debug.h
llvm/include/llvm/Support/DebugLog.h
llvm/unittests/Support/DebugLogTest.cpp
mlir/lib/Dialect/Transform/IR/TransformOps.cpp
Removed:
diff --git a/llvm/include/llvm/Support/Debug.h
b/llvm/include/llvm/Support/Debug.h
index b73f2d7c8b852..a7795d403721c 100644
--- a/llvm/include/llvm/Support/Debug.h
+++ b/llvm/include/llvm/Support/Debug.h
@@ -44,6 +44,11 @@ class raw_ostream;
/// level, return false.
LLVM_ABI bool isCurrentDebugType(const char *Type, int Level = 0);
+/// Overload allowing to swap the order of the Type and Level arguments.
+LLVM_ABI inline bool isCurrentDebugType(int Level, const char *Type) {
+ return isCurrentDebugType(Type, Level);
+}
+
/// setCurrentDebugType - Set the current debug type, as if the -debug-only=X
/// option were specified. Note that DebugFlag also needs to be set to true
for
/// debug output to be produced.
diff --git a/llvm/include/llvm/Support/DebugLog.h
b/llvm/include/llvm/Support/DebugLog.h
index f7748bc9904b1..dce706e196bde 100644
--- a/llvm/include/llvm/Support/DebugLog.h
+++ b/llvm/include/llvm/Support/DebugLog.h
@@ -19,55 +19,52 @@
namespace llvm {
#ifndef NDEBUG
-/// LDBG() is a macro that can be used as a raw_ostream for debugging.
-/// It will stream the output to the dbgs() stream, with a prefix of the
-/// debug type and the file and line number. A trailing newline is added to the
-/// output automatically. If the streamed content contains a newline, the
prefix
-/// is added to each beginning of a new line. Nothing is printed if the debug
-/// output is not enabled or the debug type does not match.
-///
-/// E.g.,
-/// LDBG() << "Bitset contains: " << Bitset;
-/// is equivalent to
-/// LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" <<
-/// __LINE__ << " "
-/// << "Bitset contains: " << Bitset << "\n");
-///
+// LDBG() is a macro that can be used as a raw_ostream for debugging.
+// It will stream the output to the dbgs() stream, with a prefix of the
+// debug type and the file and line number. A trailing newline is added to the
+// output automatically. If the streamed content contains a newline, the prefix
+// is added to each beginning of a new line. Nothing is printed if the debug
+// output is not enabled or the debug type does not match.
+//
+// E.g.,
+// LDBG() << "Bitset contains: " << Bitset;
+// is somehow equivalent to
+// LLVM_DEBUG(dbgs() << "[" << DEBUG_TYPE << "] " << __FILE__ << ":" <<
+// __LINE__ << " "
+// << "Bitset contains: " << Bitset << "\n");
+//
// An optional `level` argument can be provided to control the verbosity of the
-/// output. The default level is 1, and is in increasing level of verbosity.
-///
-/// The `level` argument can be a literal integer, or a macro that evaluates to
-/// an integer.
-///
-/// An optional `type` argument can be provided to control the debug type. The
-/// default type is DEBUG_TYPE. The `type` argument can be a literal string, or
-/// a macro that evaluates to a string.
-///
-/// E.g.,
-/// LDBG(2) << "Bitset contains: " << Bitset;
-/// LDBG("debug_type") << "Bitset contains: " << Bitset;
-/// LDBG("debug_type", 2) << "Bitset contains: " << Bitset;
+// output. The default level is 1, and is in increasing level of verbosity.
+//
+// The `level` argument can be a literal integer, or a macro that evaluates to
+// an integer.
+//
+// An optional `type` argument can be provided to control the debug type. The
+// default type is DEBUG_TYPE. The `type` argument can be a literal string, or
a
+// macro that evaluates to a string.
#define LDBG(...) _GET_LDBG_MACRO(__VA_ARGS__)(__VA_ARGS__)
-/// LDBG_OS() is a macro that behaves like LDBG() but instead of directly using
-/// it to stream the output, it takes a callback function that will be called
-/// with a raw_ostream.
-/// This is useful when you need to pass a `raw_ostream` to a helper function
to
-/// be able to print (when the `<<` operator is not available).
-///
-/// E.g.,
-/// LDBG_OS([&] (raw_ostream &Os) {
-/// Os << "Pass Manager contains: ";
-/// pm.printAsTextual(Os);
-/// });
-///
-/// Just like LDBG(), it optionally accepts a `level` and `type` arguments.
-/// E.g.,
-/// LDBG_OS(2, [&] (raw_ostream &Os) { ... });
-/// LDBG_OS("debug_type", [&] (raw_ostream &Os) { ... });
-/// LDBG_OS("debug_type", 2, [&] (raw_ostream &Os) { ... });
-///
-#defi
[llvm-branch-commits] [llvm] 1ff8b74 - Revert "[DebugLine] Correct debug line emittion (#157529)"
Author: David Blaikie
Date: 2025-09-12T11:22:56-07:00
New Revision: 1ff8b74c4d592f3c235460fda236e636b2f2590f
URL:
https://github.com/llvm/llvm-project/commit/1ff8b74c4d592f3c235460fda236e636b2f2590f
DIFF:
https://github.com/llvm/llvm-project/commit/1ff8b74c4d592f3c235460fda236e636b2f2590f.diff
LOG: Revert "[DebugLine] Correct debug line emittion (#157529)"
This reverts commit 84f431c35b3fbd5b9c46608689f25a5d29bc0f55.
Added:
Modified:
llvm/lib/MC/MCDwarf.cpp
llvm/test/DebugInfo/X86/DW_AT_LLVM_stmt_seq_sec_offset.ll
llvm/test/MC/ELF/debug-loc-label.s
Removed:
llvm/test/DebugInfo/ARM/stmt_seq_macho.test
diff --git a/llvm/lib/MC/MCDwarf.cpp b/llvm/lib/MC/MCDwarf.cpp
index e8f000a584839..e7c0d37e8f99b 100644
--- a/llvm/lib/MC/MCDwarf.cpp
+++ b/llvm/lib/MC/MCDwarf.cpp
@@ -181,7 +181,7 @@ void MCDwarfLineTable::emitOne(
unsigned FileNum, LastLine, Column, Flags, Isa, Discriminator;
bool IsAtStartSeq;
- MCSymbol *PrevLabel;
+ MCSymbol *LastLabel;
auto init = [&]() {
FileNum = 1;
LastLine = 1;
@@ -189,31 +189,21 @@ void MCDwarfLineTable::emitOne(
Flags = DWARF2_LINE_DEFAULT_IS_STMT ? DWARF2_FLAG_IS_STMT : 0;
Isa = 0;
Discriminator = 0;
-PrevLabel = nullptr;
+LastLabel = nullptr;
IsAtStartSeq = true;
};
init();
// Loop through each MCDwarfLineEntry and encode the dwarf line number table.
bool EndEntryEmitted = false;
- for (auto It = LineEntries.begin(); It != LineEntries.end(); ++It) {
-auto LineEntry = *It;
-MCSymbol *CurrLabel = LineEntry.getLabel();
+ for (const MCDwarfLineEntry &LineEntry : LineEntries) {
+MCSymbol *Label = LineEntry.getLabel();
const MCAsmInfo *asmInfo = MCOS->getContext().getAsmInfo();
if (LineEntry.LineStreamLabel) {
if (!IsAtStartSeq) {
-auto *Label = CurrLabel;
-auto NextIt = It + 1;
-// LineEntry with a null Label is probably a fake LineEntry we added
-// when `-emit-func-debug-line-table-offsets` in order to terminate the
-// sequence. Look for the next Label if possible, otherwise we will set
-// the PC to the end of the section.
-if (!Label && NextIt != LineEntries.end()) {
- Label = NextIt->getLabel();
-}
-MCOS->emitDwarfLineEndEntry(Section, PrevLabel,
-/*EndLabel =*/Label);
+MCOS->emitDwarfLineEndEntry(Section, LastLabel,
+/*EndLabel =*/LastLabel);
init();
}
MCOS->emitLabel(LineEntry.LineStreamLabel, LineEntry.StreamLabelDefLoc);
@@ -221,7 +211,7 @@ void MCDwarfLineTable::emitOne(
}
if (LineEntry.IsEndEntry) {
- MCOS->emitDwarfAdvanceLineAddr(INT64_MAX, PrevLabel, CurrLabel,
+ MCOS->emitDwarfAdvanceLineAddr(INT64_MAX, LastLabel, Label,
asmInfo->getCodePointerSize());
init();
EndEntryEmitted = true;
@@ -268,12 +258,12 @@ void MCDwarfLineTable::emitOne(
// At this point we want to emit/create the sequence to encode the delta in
// line numbers and the increment of the address from the previous Label
// and the current Label.
-MCOS->emitDwarfAdvanceLineAddr(LineDelta, PrevLabel, CurrLabel,
+MCOS->emitDwarfAdvanceLineAddr(LineDelta, LastLabel, Label,
asmInfo->getCodePointerSize());
Discriminator = 0;
LastLine = LineEntry.getLine();
-PrevLabel = CurrLabel;
+LastLabel = Label;
IsAtStartSeq = false;
}
@@ -283,7 +273,7 @@ void MCDwarfLineTable::emitOne(
// does not track ranges nor terminate the line table. In that case,
// conservatively use the section end symbol to end the line table.
if (!EndEntryEmitted && !IsAtStartSeq)
-MCOS->emitDwarfLineEndEntry(Section, PrevLabel);
+MCOS->emitDwarfLineEndEntry(Section, LastLabel);
}
void MCDwarfLineTable::endCurrentSeqAndEmitLineStreamLabel(MCStreamer *MCOS,
diff --git a/llvm/test/DebugInfo/ARM/stmt_seq_macho.test
b/llvm/test/DebugInfo/ARM/stmt_seq_macho.test
deleted file mode 100644
index f0874bfc45ed2..0
--- a/llvm/test/DebugInfo/ARM/stmt_seq_macho.test
+++ /dev/null
@@ -1,98 +0,0 @@
-// RUN: split-file %s %t
-
-// RUN: clang++ --target=arm64-apple-macos11 \
-// RUN: %t/stmt_seq_macho.cpp -o %t/stmt_seq_macho.o \
-// RUN: -g -Oz -gdwarf-4 -c -mno-outline \
-// RUN: -mllvm -emit-func-debug-line-table-offsets \
-// RUN: -fdebug-compilation-dir=/private/tmp/stmt_seq \
-// RUN: -fno-unwind-tables -fno-exceptions
-
-// RUN: llvm-dwarfdump -all %t/stmt_seq_macho.o | FileCheck %s
-
-// CHECK: AddressLine Column File ISA
Discriminator OpIndex Flags
-// CHECK-NEXT: -- -- -- -- ---
- --- -
-// CHECK-NEXT: 0x 2 33 1
[llvm-branch-commits] [NFC][CFI][CodeGen] Move GeneralizeFunctionType out of CreateMetadataIdentifierGeneralized (PR #158190)
https://github.com/vitalybuka updated https://github.com/llvm/llvm-project/pull/158190 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] Mips: Switch to RegClassByHwMode (PR #158273)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/158273?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#158273** https://app.graphite.dev/github/pr/llvm/llvm-project/158273?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158273?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#158269** https://app.graphite.dev/github/pr/llvm/llvm-project/158269?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 2 other dependent PRs ([#158271](https://github.com/llvm/llvm-project/pull/158271) https://app.graphite.dev/github/pr/llvm/llvm-project/158271?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>, [#158272](https://github.com/llvm/llvm-project/pull/158272) https://app.graphite.dev/github/pr/llvm/llvm-project/158272?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/158273 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] X86: Switch to RegClassByHwMode (PR #158274)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/158274 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload] Add GenericPluginTy::get_mem_info (PR #157484)
https://github.com/RossBrunton updated
https://github.com/llvm/llvm-project/pull/157484
>From 7bf7fe1df8a873964df2ebc17328d9bef00f1347 Mon Sep 17 00:00:00 2001
From: Ross Brunton
Date: Mon, 8 Sep 2025 10:45:42 +0100
Subject: [PATCH 1/6] [Offload] Add GenericPluginTy::get_mem_info
This takes a pointer allocated by the plugin, and returns a struct
containing important information about it. This is now used in
`olMemFree` instead of using a map to track allocation info.
---
offload/include/omptarget.h | 2 +
offload/liboffload/src/OffloadImpl.cpp| 27 +--
.../amdgpu/dynamic_hsa/hsa.cpp| 1 +
.../amdgpu/dynamic_hsa/hsa_ext_amd.h | 3 +
offload/plugins-nextgen/amdgpu/src/rtl.cpp| 31 ++-
.../common/include/PluginInterface.h | 13 ++
offload/plugins-nextgen/cuda/src/rtl.cpp | 216 +++---
offload/plugins-nextgen/host/src/rtl.cpp | 5 +
8 files changed, 193 insertions(+), 105 deletions(-)
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 8fd722bb15022..197cbd3806d91 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -96,6 +96,8 @@ enum OpenMPOffloadingDeclareTargetFlags {
OMP_REGISTER_REQUIRES = 0x10,
};
+// Note: This type should be no larger than 3 bits, as the amdgpu platform uses
+// the lower 3 bits of a pointer to store it
enum TargetAllocTy : int32_t {
TARGET_ALLOC_DEVICE = 0,
TARGET_ALLOC_HOST,
diff --git a/offload/liboffload/src/OffloadImpl.cpp
b/offload/liboffload/src/OffloadImpl.cpp
index fef3a5669e0d5..9620c35ac5c10 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -201,8 +201,6 @@ struct OffloadContext {
bool TracingEnabled = false;
bool ValidationEnabled = true;
- DenseMap AllocInfoMap{};
- std::mutex AllocInfoMapMutex{};
SmallVector Platforms{};
size_t RefCount;
@@ -624,32 +622,15 @@ Error olMemAlloc_impl(ol_device_handle_t Device,
ol_alloc_type_t Type,
return Alloc.takeError();
*AllocationOut = *Alloc;
- {
-std::lock_guard Lock(OffloadContext::get().AllocInfoMapMutex);
-OffloadContext::get().AllocInfoMap.insert_or_assign(
-*Alloc, AllocInfo{Device, Type});
- }
return Error::success();
}
Error olMemFree_impl(ol_platform_handle_t Platform, void *Address) {
- ol_device_handle_t Device;
- ol_alloc_type_t Type;
- {
-std::lock_guard Lock(OffloadContext::get().AllocInfoMapMutex);
-if (!OffloadContext::get().AllocInfoMap.contains(Address))
- return createOffloadError(ErrorCode::INVALID_ARGUMENT,
-"address is not a known allocation");
-
-auto AllocInfo = OffloadContext::get().AllocInfoMap.at(Address);
-Device = AllocInfo.Device;
-Type = AllocInfo.Type;
-OffloadContext::get().AllocInfoMap.erase(Address);
- }
- assert(Platform == Device->Platform);
+ auto MemInfo = Platform->Plugin->get_memory_info(Address);
+ if (auto Err = MemInfo.takeError())
+return Err;
- if (auto Res =
- Device->Device->dataDelete(Address, convertOlToPluginAllocTy(Type)))
+ if (auto Res = MemInfo->Device->dataDelete(Address, MemInfo->Type))
return Res;
return Error::success();
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
index bc92f4a46a5c0..7f0e75cb9b500 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa.cpp
@@ -68,6 +68,7 @@ DLWRAP(hsa_amd_register_system_event_handler, 2)
DLWRAP(hsa_amd_signal_create, 5)
DLWRAP(hsa_amd_signal_async_handler, 5)
DLWRAP(hsa_amd_pointer_info, 5)
+DLWRAP(hsa_amd_pointer_info_set_userdata, 2)
DLWRAP(hsa_code_object_reader_create_from_memory, 3)
DLWRAP(hsa_code_object_reader_destroy, 1)
DLWRAP(hsa_executable_load_agent_code_object, 5)
diff --git a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
index 29cfe78082dbb..5c2fbd127c86d 100644
--- a/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
+++ b/offload/plugins-nextgen/amdgpu/dynamic_hsa/hsa_ext_amd.h
@@ -160,6 +160,7 @@ typedef struct hsa_amd_pointer_info_s {
void* agentBaseAddress;
void* hostBaseAddress;
size_t sizeInBytes;
+ void *userData;
} hsa_amd_pointer_info_t;
hsa_status_t hsa_amd_pointer_info(const void* ptr,
@@ -168,6 +169,8 @@ hsa_status_t hsa_amd_pointer_info(const void* ptr,
uint32_t* num_agents_accessible,
hsa_agent_t** accessible);
+hsa_status_t hsa_amd_pointer_info_set_userdata(const void *ptr, void
*userdata);
+
#ifdef __cplusplus
}
#endif
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index c26cfe961aa0e..90d9ca9f787e7 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nex
[llvm-branch-commits] [llvm] [DA] Add overflow check in ExactSIV (PR #157086)
https://github.com/kasuga-fj updated
https://github.com/llvm/llvm-project/pull/157086
>From 94b18495719b35a89ee6a18e474e8e92a4429d99 Mon Sep 17 00:00:00 2001
From: Ryotaro Kasuga
Date: Fri, 5 Sep 2025 11:41:29 +
Subject: [PATCH] [DA] Add overflow check in ExactSIV
---
llvm/lib/Analysis/DependenceAnalysis.cpp | 14 +-
llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll | 2 +-
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Analysis/DependenceAnalysis.cpp
b/llvm/lib/Analysis/DependenceAnalysis.cpp
index 0f77a1410e83b..6e576e866b310 100644
--- a/llvm/lib/Analysis/DependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/DependenceAnalysis.cpp
@@ -1170,6 +1170,15 @@ const SCEVConstant
*DependenceInfo::collectConstantUpperBound(const Loop *L,
return nullptr;
}
+/// Returns \p A - \p B if it guaranteed not to signed wrap. Otherwise returns
+/// nullptr. \p A and \p B must have the same integer type.
+static const SCEV *minusSCEVNoSignedOverflow(const SCEV *A, const SCEV *B,
+ ScalarEvolution &SE) {
+ if (SE.willNotOverflow(Instruction::Sub, /*Signed=*/true, A, B))
+return SE.getMinusSCEV(A, B);
+ return nullptr;
+}
+
// testZIV -
// When we have a pair of subscripts of the form [c1] and [c2],
// where c1 and c2 are both loop invariant, we attack it using
@@ -1626,7 +1635,9 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff,
const SCEV *DstCoeff,
assert(0 < Level && Level <= CommonLevels && "Level out of range");
Level--;
Result.Consistent = false;
- const SCEV *Delta = SE->getMinusSCEV(DstConst, SrcConst);
+ const SCEV *Delta = minusSCEVNoSignedOverflow(DstConst, SrcConst, *SE);
+ if (!Delta)
+return false;
LLVM_DEBUG(dbgs() << "\tDelta = " << *Delta << "\n");
NewConstraint.setLine(SrcCoeff, SE->getNegativeSCEV(DstCoeff), Delta,
CurLoop);
@@ -1716,6 +1727,7 @@ bool DependenceInfo::exactSIVtest(const SCEV *SrcCoeff,
const SCEV *DstCoeff,
// explore directions
unsigned NewDirection = Dependence::DVEntry::NONE;
APInt LowerDistance, UpperDistance;
+ // TODO: Overflow check may be needed.
if (TA.sgt(TB)) {
LowerDistance = (TY - TX) + (TA - TB) * TL;
UpperDistance = (TY - TX) + (TA - TB) * TU;
diff --git a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
index 54bb8b73da02a..fd58568d02c43 100644
--- a/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
+++ b/llvm/test/Analysis/DependenceAnalysis/ExactSIV.ll
@@ -841,7 +841,7 @@ define void @exact14(ptr %A) {
; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8
0, ptr %idx.0, align 1
; CHECK-SIV-ONLY-NEXT:da analyze - none!
; CHECK-SIV-ONLY-NEXT: Src: store i8 0, ptr %idx.0, align 1 --> Dst: store i8
1, ptr %idx.1, align 1
-; CHECK-SIV-ONLY-NEXT:da analyze - none!
+; CHECK-SIV-ONLY-NEXT:da analyze - output [*|<]!
; CHECK-SIV-ONLY-NEXT: Src: store i8 1, ptr %idx.1, align 1 --> Dst: store i8
1, ptr %idx.1, align 1
; CHECK-SIV-ONLY-NEXT:da analyze - none!
;
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (PR #158271)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/158271 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Clang] Port ulimit tests to work with internal shell (PR #157977)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/157977 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [Offload][Conformance] Update olMemFree calls in conformance tests (PR #157773)
RossBrunton wrote: @jhuber6 This was merged into my user branch, was that intentional? https://github.com/llvm/llvm-project/pull/157773 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
@@ -65,6 +65,12 @@ if (MLIR_INCLUDE_INTEGRATION_TESTS)
endif()
+option(MLIR_RUN_STANDALONE_INSTALL_TESTS "Run Standalone example install
tests." ON)
+if(MLIR_RUN_STANDALONE_INSTALL_TESTS AND "${CMAKE_INSTALL_PREFIX}" STREQUAL "")
+ message(WARNING "Standalone example install tests will install into root!\
rengolin wrote:
D'oh, now I saw the line above! 🤣
https://github.com/llvm/llvm-project/pull/157944
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [llvm-profgen] Extend llvm-profgen to generate vtable profiles with data access events for non context-sensitive profiles using debug info (PR #148013)
https://github.com/paschalis-mpeis approved this pull request. Thanks for addressing the comments and adding a pie test. Looks good. https://github.com/llvm/llvm-project/pull/148013 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Generate canonical additions in AMDGPUPromoteAlloca (PR #157810)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Fabian Ritter (ritter-x2a)
Changes
When we know that one operand of an addition is a constant, we might was
well put it on the right-hand side and avoid the work to canonicalize it
in a later pass.
---
Full diff: https://github.com/llvm/llvm-project/pull/157810.diff
4 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll (+4-4)
- (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-negative-index.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-vector-gep-of-gep.ll
(+3-3)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index bb77cdff778c0..7dbe1235a98b5 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -478,7 +478,7 @@ static Value *GEPToVectorIndex(GetElementPtrInst *GEP,
AllocaInst *Alloca,
ConstantInt *ConstIndex =
ConstantInt::get(OffsetType, IndexQuot.getSExtValue());
- Value *IndexAdd = Builder.CreateAdd(ConstIndex, Offset);
+ Value *IndexAdd = Builder.CreateAdd(Offset, ConstIndex);
if (Instruction *NewInst = dyn_cast(IndexAdd))
NewInsts.push_back(NewInst);
return IndexAdd;
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll
b/llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll
index d72f158763c61..63622e67e7d0b 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-multidim.ll
@@ -312,7 +312,7 @@ define amdgpu_kernel void
@i64_2d_load_store_subvec_3_i64_offset_index(ptr %out)
; CHECK-NEXT:[[TMP15:%.*]] = insertelement <6 x i64> [[TMP14]], i64 4, i32
4
; CHECK-NEXT:[[TMP16:%.*]] = insertelement <6 x i64> [[TMP15]], i64 5, i32
5
; CHECK-NEXT:[[TMP1:%.*]] = mul i64 [[SEL3]], 3
-; CHECK-NEXT:[[TMP2:%.*]] = add i64 6, [[TMP1]]
+; CHECK-NEXT:[[TMP2:%.*]] = add i64 [[TMP1]], 6
; CHECK-NEXT:[[TMP3:%.*]] = extractelement <6 x i64> [[TMP16]], i64
[[TMP2]]
; CHECK-NEXT:[[TMP4:%.*]] = insertelement <3 x i64> poison, i64 [[TMP3]],
i64 0
; CHECK-NEXT:[[TMP5:%.*]] = add i64 [[TMP2]], 1
@@ -464,7 +464,7 @@ define amdgpu_kernel void @i16_2d_load_store(ptr %out, i32
%sel) {
; CHECK-NEXT:[[TMP4:%.*]] = insertelement <6 x i16> [[TMP3]], i16 3, i32 3
; CHECK-NEXT:[[TMP5:%.*]] = insertelement <6 x i16> [[TMP4]], i16 4, i32 4
; CHECK-NEXT:[[TMP6:%.*]] = insertelement <6 x i16> [[TMP5]], i16 5, i32 5
-; CHECK-NEXT:[[TMP1:%.*]] = add i32 3, [[SEL]]
+; CHECK-NEXT:[[TMP1:%.*]] = add i32 [[SEL]], 3
; CHECK-NEXT:[[TMP2:%.*]] = extractelement <6 x i16> [[TMP6]], i32 [[TMP1]]
; CHECK-NEXT:store i16 [[TMP2]], ptr [[OUT]], align 2
; CHECK-NEXT:ret void
@@ -498,7 +498,7 @@ define amdgpu_kernel void @float_2d_load_store(ptr %out,
i32 %sel) {
; CHECK-NEXT:[[TMP4:%.*]] = insertelement <6 x float> [[TMP3]], float
3.00e+00, i32 3
; CHECK-NEXT:[[TMP5:%.*]] = insertelement <6 x float> [[TMP4]], float
4.00e+00, i32 4
; CHECK-NEXT:[[TMP6:%.*]] = insertelement <6 x float> [[TMP5]], float
5.00e+00, i32 5
-; CHECK-NEXT:[[TMP1:%.*]] = add i32 3, [[SEL]]
+; CHECK-NEXT:[[TMP1:%.*]] = add i32 [[SEL]], 3
; CHECK-NEXT:[[TMP2:%.*]] = extractelement <6 x float> [[TMP6]], i32
[[TMP1]]
; CHECK-NEXT:store float [[TMP2]], ptr [[OUT]], align 4
; CHECK-NEXT:ret void
@@ -538,7 +538,7 @@ define amdgpu_kernel void @ptr_2d_load_store(ptr %out, i32
%sel) {
; CHECK-NEXT:[[TMP4:%.*]] = insertelement <6 x ptr> [[TMP3]], ptr
[[PTR_3]], i32 3
; CHECK-NEXT:[[TMP5:%.*]] = insertelement <6 x ptr> [[TMP4]], ptr
[[PTR_4]], i32 4
; CHECK-NEXT:[[TMP6:%.*]] = insertelement <6 x ptr> [[TMP5]], ptr
[[PTR_5]], i32 5
-; CHECK-NEXT:[[TMP7:%.*]] = add i32 3, [[SEL]]
+; CHECK-NEXT:[[TMP7:%.*]] = add i32 [[SEL]], 3
; CHECK-NEXT:[[TMP8:%.*]] = extractelement <6 x ptr> [[TMP6]], i32 [[TMP7]]
; CHECK-NEXT:store ptr [[TMP8]], ptr [[OUT]], align 8
; CHECK-NEXT:ret void
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-negative-index.ll
b/llvm/test/CodeGen/AMDGPU/promote-alloca-negative-index.ll
index 1b6ac0bd93c19..a865bf5058d6a 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-negative-index.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-negative-index.ll
@@ -11,7 +11,7 @@ define amdgpu_kernel void @negative_index_byte(ptr %out, i64
%offset) {
; CHECK-NEXT:[[TMP2:%.*]] = insertelement <4 x i8> [[TMP1]], i8 1, i32 1
; CHECK-NEXT:[[TMP3:%.*]] = insertelement <4 x i8> [[TMP2]], i8 2, i32 2
; CHECK-NEXT:[[TMP4:%.*]] = insertelement <4 x i8> [[TMP3]], i8 3, i32 3
-; CHECK-NEXT:[[TMP5:%.*]] = add i64 -1, [[OFFSET:%.*]]
+; CHECK-NEXT:[[TMP5:%.*]] = add i64 [[OFFSET:%.*]], -1
; CHECK-NEXT:[[TMP6:%.*]] = extractelement <4 x i8> [[TMP4]], i64 [[TMP5]]
; CHECK-NEXT:
[llvm-branch-commits] [llvm] CodeGen: Remove TRI arguments from stack load/store hooks (PR #158240)
https://github.com/RKSimon approved this pull request. LGTM with the clang-format fix https://github.com/llvm/llvm-project/pull/158240 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
@@ -65,6 +65,12 @@ if (MLIR_INCLUDE_INTEGRATION_TESTS)
endif()
+option(MLIR_RUN_STANDALONE_INSTALL_TESTS "Run Standalone example install
tests." ON)
+if(MLIR_RUN_STANDALONE_INSTALL_TESTS AND "${CMAKE_INSTALL_PREFIX}" STREQUAL "")
+ message(WARNING "Standalone example install tests will install into root!\
christopherbate wrote:
Shouldn't any potential to write outside the build directory in a test by a
FATAL_ERROR not a WARNING?
https://github.com/llvm/llvm-project/pull/157944
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang][LoongArch] Introduce LASX and LSX conversion intrinsics (PR #157819)
llvmbot wrote:
@llvm/pr-subscribers-clang
Author: hev (heiher)
Changes
This patch introduces the LASX and LSX conversion intrinsics:
- __m256 __lasx_cast_128_s (__m128)
- __m256d __lasx_cast_128_d (__m128d)
- __m256i __lasx_cast_128 (__m128i)
- __m256 __lasx_concat_128_s (__m128, __m128)
- __m256d __lasx_concat_128_d (__m128, __m128d)
- __m256i __lasx_concat_128 (__m128, __m128i)
- __m128 __lasx_extract_128_lo_s (__m256)
- __m128d __lasx_extract_128_lo_d (__m256d)
- __m128i __lasx_extract_128_lo (__m256i)
- __m128 __lasx_extract_128_hi_s (__m256)
- __m128d __lasx_extract_128_hi_d (__m256d)
- __m128i __lasx_extract_128_hi (__m256i)
- __m256 __lasx_insert_128_lo_s (__m256, __m128)
- __m256d __lasx_insert_128_lo_d (__m256d, __m128d)
- __m256i __lasx_insert_128_lo (__m256i, __m128i)
- __m256 __lasx_insert_128_hi_s (__m256, __m128)
- __m256d __lasx_insert_128_hi_d (__m256d, __m128d)
- __m256i __lasx_insert_128_hi (__m256i, __m128i)
---
Patch is 25.73 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/157819.diff
4 Files Affected:
- (modified) clang/include/clang/Basic/BuiltinsLoongArchLASX.def (+19)
- (modified) clang/lib/Headers/lasxintrin.h (+110)
- (modified) clang/test/CodeGen/LoongArch/lasx/builtin-alias.c (+153)
- (modified) clang/test/CodeGen/LoongArch/lasx/builtin.c (+157)
``diff
diff --git a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
index c4ea46a3bc5b5..b234dedad648e 100644
--- a/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
+++ b/clang/include/clang/Basic/BuiltinsLoongArchLASX.def
@@ -986,3 +986,22 @@ TARGET_BUILTIN(__builtin_lasx_xbnz_b, "iV32Uc", "nc",
"lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_h, "iV16Us", "nc", "lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_w, "iV8Ui", "nc", "lasx")
TARGET_BUILTIN(__builtin_lasx_xbnz_d, "iV4ULLi", "nc", "lasx")
+
+TARGET_BUILTIN(__builtin_lasx_cast_128_s, "V8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128_d, "V4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_cast_128, "V32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_s, "V8fV4fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128_d, "V4dV2dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_concat_128, "V32ScV16ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_lo, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_s, "V4fV8f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi_d, "V2dV4d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_extract_128_hi, "V16ScV32Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_lo, "V32ScV32ScV16Sc", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_s, "V8fV8fV4f", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi_d, "V4dV4dV2d", "nc", "lasx")
+TARGET_BUILTIN(__builtin_lasx_insert_128_hi, "V32ScV32ScV16Sc", "nc", "lasx")
diff --git a/clang/lib/Headers/lasxintrin.h b/clang/lib/Headers/lasxintrin.h
index 85020d82829e2..6dd8ac24ed46d 100644
--- a/clang/lib/Headers/lasxintrin.h
+++ b/clang/lib/Headers/lasxintrin.h
@@ -10,6 +10,8 @@
#ifndef _LOONGSON_ASXINTRIN_H
#define _LOONGSON_ASXINTRIN_H 1
+#include
+
#if defined(__loongarch_asx)
typedef signed char v32i8 __attribute__((vector_size(32), aligned(32)));
@@ -3882,5 +3884,113 @@ extern __inline
#define __lasx_xvrepli_w(/*si10*/ _1) ((__m256i)__builtin_lasx_xvrepli_w((_1)))
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256
+__lasx_cast_128_s(__m128 _1) {
+ return (__m256)__builtin_lasx_cast_128_s((v4f32)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256d
+__lasx_cast_128_d(__m128d _1) {
+ return (__m256d)__builtin_lasx_cast_128_d((v2f64)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256i
+__lasx_cast_128(__m128i _1) {
+ return (__m256i)__builtin_lasx_cast_128((v16i8)_1);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256
+__lasx_concat_128_s(__m128 _1, __m128 _2) {
+ return (__m256)__builtin_lasx_concat_128_s((v4f32)_1, (v4f32)_2);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256d
+__lasx_concat_128_d(__m128d _1, __m128d _2) {
+ return (__m256d)__builtin_lasx_concat_128_d((v2f64)_1, (v2f64)_2);
+}
+
+extern __inline
+__attribute__((__gnu_inline__, __always_inline__, __artificial__)) __m256i
+__lasx_concat_128(__m128i _1, __m128i _2) {
+ return (__m256i)__builtin_lasx_concat_128((v16i8)_1, (v1
[llvm-branch-commits] [llvm] Add deactivation symbol operand to ConstantPtrAuth. (PR #133537)
pcc wrote: > I have checked in with @ahmedbougacha and his feeling is that this is fine as > it requires a bunch of work to opt in, and for places where the security is > important enough that we don't want people using this it's easy enough to > block. Thanks for checking. > I'm concerned about the interaction of these changes with ptrauth intrinsic > optimizations I took a look and found some cases where we needed to inhibit optimizations. There was no practical effect due to how PFP uses these intrinisics, but I implemented the inhibitions in #133536 and this PR. > the ability for attackers to gain control of the enablement flags. This isn't possible, the symbols are resolved at static link time. See the RFC for more information: https://discourse.llvm.org/t/rfc-deactivation-symbols/85556 https://github.com/llvm/llvm-project/pull/133537 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [AMDGPU] Add builtins for wave reduction intrinsics (PR #150170)
https://github.com/easyonaadit updated
https://github.com/llvm/llvm-project/pull/150170
>From 308545da2b700e93d2c4b5e32c8392468385 Mon Sep 17 00:00:00 2001
From: Aaditya
Date: Sat, 19 Jul 2025 12:57:27 +0530
Subject: [PATCH] Add builtins for wave reduction intrinsics
---
clang/include/clang/Basic/BuiltinsAMDGPU.def | 25 ++
clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp | 58 +++
clang/test/CodeGenOpenCL/builtins-amdgcn.cl | 378 +++
3 files changed, 461 insertions(+)
diff --git a/clang/include/clang/Basic/BuiltinsAMDGPU.def
b/clang/include/clang/Basic/BuiltinsAMDGPU.def
index e5a1422fe8778..56b1a8dc09b15 100644
--- a/clang/include/clang/Basic/BuiltinsAMDGPU.def
+++ b/clang/include/clang/Basic/BuiltinsAMDGPU.def
@@ -364,6 +364,31 @@ BUILTIN(__builtin_amdgcn_endpgm, "v", "nr")
BUILTIN(__builtin_amdgcn_get_fpenv, "WUi", "n")
BUILTIN(__builtin_amdgcn_set_fpenv, "vWUi", "n")
+//===--===//
+
+// Wave Reduction builtins.
+
+//===--===//
+
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u32, "ZUiZUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b32, "ZiZiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_add_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_sub_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_min_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_i64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_max_u64, "WUiWUiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_and_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_or_b64, "WiWiZi", "nc")
+BUILTIN(__builtin_amdgcn_wave_reduce_xor_b64, "WiWiZi", "nc")
+
//===--===//
// R600-NI only builtins.
//===--===//
diff --git a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
index 87a46287c4022..07cf08c54985a 100644
--- a/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
+++ b/clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp
@@ -295,11 +295,69 @@ void
CodeGenFunction::AddAMDGPUFenceAddressSpaceMMRA(llvm::Instruction *Inst,
Inst->setMetadata(LLVMContext::MD_mmra, MMRAMetadata::getMD(Ctx, MMRAs));
}
+static Intrinsic::ID getIntrinsicIDforWaveReduction(unsigned BuiltinID) {
+ switch (BuiltinID) {
+ default:
+llvm_unreachable("Unknown BuiltinID for wave reduction");
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u64:
+return Intrinsic::amdgcn_wave_reduce_add;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u64:
+return Intrinsic::amdgcn_wave_reduce_sub;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_i64:
+return Intrinsic::amdgcn_wave_reduce_min;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_min_u64:
+return Intrinsic::amdgcn_wave_reduce_umin;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_i64:
+return Intrinsic::amdgcn_wave_reduce_max;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_max_u64:
+return Intrinsic::amdgcn_wave_reduce_umax;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_and_b64:
+return Intrinsic::amdgcn_wave_reduce_and;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_or_b64:
+return Intrinsic::amdgcn_wave_reduce_or;
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b32:
+ case clang::AMDGPU::BI__builtin_amdgcn_wave_reduce_xor_b64:
+return Intrinsic::amdgcn_wave_reduce_xor;
+ }
+}
+
Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
const CallExpr *E) {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
llvm::SyncScope::ID SSID;
switch (BuiltinID) {
+ case AMDGPU::BI__builtin_amdgcn_wave_reduce_add_u32:
+ case AMDGPU::BI__builtin_amdgcn_wave_reduce_sub_u
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Import D16 load patterns and add combines for them (PR #153178)
https://github.com/petar-avramovic updated
https://github.com/llvm/llvm-project/pull/153178
>From 739caaa21a514cc89c57deae344bf563c9563e90 Mon Sep 17 00:00:00 2001
From: Petar Avramovic
Date: Tue, 26 Aug 2025 14:10:41 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Import D16 load patterns and add combines
for them
Add G_AMDGPU_LOAD_D16 generic instructions and GINodeEquivs for them,
this will import D16 load patterns to global-isel's tablegened
instruction selector.
For newly imported patterns to work add combines for G_AMDGPU_LOAD_D16
in AMDGPURegBankCombiner.
---
llvm/lib/Target/AMDGPU/AMDGPUCombine.td | 9 +-
llvm/lib/Target/AMDGPU/AMDGPUGISel.td | 7 +
.../Target/AMDGPU/AMDGPURegBankCombiner.cpp | 86
llvm/lib/Target/AMDGPU/SIInstructions.td | 15 +
.../AMDGPU/GlobalISel/atomic_load_flat.ll | 15 +-
.../AMDGPU/GlobalISel/atomic_load_global.ll | 15 +-
.../AMDGPU/GlobalISel/atomic_load_local_2.ll | 13 +-
.../CodeGen/AMDGPU/GlobalISel/load-d16.ll | 412 ++
llvm/test/CodeGen/AMDGPU/global-saddr-load.ll | 246 +++
9 files changed, 622 insertions(+), 196 deletions(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/load-d16.ll
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index b5dac95b57a2d..e8b211f7866ad 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -71,6 +71,12 @@ def int_minmax_to_med3 : GICombineRule<
[{ return matchIntMinMaxToMed3(*${min_or_max}, ${matchinfo}); }]),
(apply [{ applyMed3(*${min_or_max}, ${matchinfo}); }])>;
+let Predicates = [Predicate<"Subtarget->d16PreservesUnusedBits()">] in
+def d16_load : GICombineRule<
+ (defs root:$bitcast),
+ (combine (G_BITCAST $dst, $src):$bitcast,
+ [{ return combineD16Load(*${bitcast} ); }])>;
+
def fp_minmax_to_med3 : GICombineRule<
(defs root:$min_or_max, med3_matchdata:$matchinfo),
(match (wip_match_opcode G_FMAXNUM,
@@ -219,5 +225,6 @@ def AMDGPURegBankCombiner : GICombiner<
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
identity_combines, redundant_and, constant_fold_cast_op,
- cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines]> {
+ cast_of_cast_combines, sext_trunc, zext_of_shift_amount_combines,
+ d16_load]> {
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
index 0c112d1787c1a..bb4bf742fb861 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUGISel.td
@@ -315,6 +315,13 @@ def : GINodeEquiv;
def : GINodeEquiv;
def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
+def : GINodeEquiv;
+
def : GINodeEquiv;
// G_AMDGPU_WHOLE_WAVE_FUNC_RETURN is simpler than AMDGPUwhole_wave_return,
// so we don't mark it as equivalent.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
index ee324a5e93f0f..fd604e1b19cd4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp
@@ -89,6 +89,10 @@ class AMDGPURegBankCombinerImpl : public Combiner {
void applyCanonicalizeZextShiftAmt(MachineInstr &MI, MachineInstr &Ext)
const;
+ bool combineD16Load(MachineInstr &MI) const;
+ bool applyD16Load(unsigned D16Opc, MachineInstr &DstMI,
+MachineInstr *SmallLoad, Register ToOverwriteD16) const;
+
private:
SIModeRegisterDefaults getMode() const;
bool getIEEE() const;
@@ -392,6 +396,88 @@ void
AMDGPURegBankCombinerImpl::applyCanonicalizeZextShiftAmt(
MI.eraseFromParent();
}
+bool AMDGPURegBankCombinerImpl::combineD16Load(MachineInstr &MI) const {
+ Register Dst;
+ MachineInstr *Load, *SextLoad;
+ const int64_t CleanLo16 = 0x;
+ const int64_t CleanHi16 = 0x;
+
+ // Load lo
+ if (mi_match(MI.getOperand(1).getReg(), MRI,
+ m_GOr(m_GAnd(m_GBitcast(m_Reg(Dst)),
+m_Copy(m_SpecificICst(CleanLo16))),
+ m_MInstr(Load {
+
+if (Load->getOpcode() == AMDGPU::G_ZEXTLOAD) {
+ const MachineMemOperand *MMO = *Load->memoperands_begin();
+ unsigned LoadSize = MMO->getSizeInBits().getValue();
+ if (LoadSize == 8)
+return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO_U8, MI, Load, Dst);
+ if (LoadSize == 16)
+return applyD16Load(AMDGPU::G_AMDGPU_LOAD_D16_LO, MI, Load, Dst);
+ return false;
+}
+
+if (mi_match(
+Load, MRI,
+m_GAnd(m_MInstr(SextLoad), m_Copy(m_SpecificICst(CleanHi16) {
+ if (SextLoad->getOpcode() != AMDGPU::G_SEXTLOAD)
+return false;
+
+ const MachineMemOperand *MMO = *SextLoad->memoperands_begin();
+ if (MMO->getSizeInBits().getValue() != 8)
+retu
[llvm-branch-commits] [lld] CodeGen: Emit .prefalign directives based on the prefalign attribute. (PR #155529)
https://github.com/pcc updated https://github.com/llvm/llvm-project/pull/155529 >From 38615b9b39e93afab94c6aaa3ae6c026b7f2086a Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 26 Aug 2025 19:19:33 -0700 Subject: [PATCH] Fix failing lld test Created using spr 1.3.6-beta.1 --- lld/test/ELF/lto/linker-script-symbols-ipo.ll | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lld/test/ELF/lto/linker-script-symbols-ipo.ll b/lld/test/ELF/lto/linker-script-symbols-ipo.ll index 414ee4080bee0..39996cbfa28db 100644 --- a/lld/test/ELF/lto/linker-script-symbols-ipo.ll +++ b/lld/test/ELF/lto/linker-script-symbols-ipo.ll @@ -18,7 +18,7 @@ ; NOIPO: : ; NOIPO-NEXT: movl $2, %eax ; NOIPO: <_start>: -; NOIPO-NEXT: jmp 0x201160 +; NOIPO-NEXT: jmp 0x201158 target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [RISCV] Support PreserveMost calling convention (#148214) (PR #158403)
https://github.com/nikic closed https://github.com/llvm/llvm-project/pull/158403 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [RISCV] Support PreserveMost calling convention (#148214) (PR #158403)
nikic wrote: Duplicate of https://github.com/llvm/llvm-project/pull/158402. https://github.com/llvm/llvm-project/pull/158403 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (PR #146076)
https://github.com/ritter-x2a updated
https://github.com/llvm/llvm-project/pull/146076
>From 8710de705f09d90f166f82c1733620b2c8581306 Mon Sep 17 00:00:00 2001
From: Fabian Ritter
Date: Fri, 27 Jun 2025 05:38:52 -0400
Subject: [PATCH 1/3] [AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by
default
Also removes the command line option to control this feature.
There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
some transformations that were sometimes harmful.
For SWDEV-516125.
---
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | 10 +-
.../identical-subrange-spill-infloop.ll | 352 +++---
.../AMDGPU/infer-addrspace-flat-atomic.ll | 14 +-
llvm/test/CodeGen/AMDGPU/lds-frame-extern.ll | 8 +-
.../AMDGPU/lower-module-lds-via-hybrid.ll | 4 +-
.../AMDGPU/lower-module-lds-via-table.ll | 16 +-
.../match-perm-extract-vector-elt-bug.ll | 22 +-
llvm/test/CodeGen/AMDGPU/memmove-var-size.ll | 16 +-
.../AMDGPU/preload-implicit-kernargs.ll | 6 +-
.../AMDGPU/promote-constOffset-to-imm.ll | 8 +-
llvm/test/CodeGen/AMDGPU/ptradd-sdag-mubuf.ll | 7 +-
.../AMDGPU/ptradd-sdag-optimizations.ll | 94 ++---
.../AMDGPU/ptradd-sdag-undef-poison.ll| 6 +-
llvm/test/CodeGen/AMDGPU/ptradd-sdag.ll | 27 +-
llvm/test/CodeGen/AMDGPU/store-weird-sizes.ll | 29 +-
15 files changed, 310 insertions(+), 309 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index a1af50dac7e54..05ab745171f6d 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -63,14 +63,6 @@ static cl::opt UseDivergentRegisterIndexing(
cl::desc("Use indirect register addressing for divergent indexes"),
cl::init(false));
-// TODO: This option should be removed once we switch to always using PTRADD in
-// the SelectionDAG.
-static cl::opt UseSelectionDAGPTRADD(
-"amdgpu-use-sdag-ptradd", cl::Hidden,
-cl::desc("Generate ISD::PTRADD nodes for 64-bit pointer arithmetic in the "
- "SelectionDAG ISel"),
-cl::init(false));
-
static bool denormalModeIsFlushAllF32(const MachineFunction &MF) {
const SIMachineFunctionInfo *Info = MF.getInfo();
return Info->getMode().FP32Denormals == DenormalMode::getPreserveSign();
@@ -11252,7 +11244,7 @@ static bool isNoUnsignedWrap(SDValue Addr) {
bool SITargetLowering::shouldPreservePtrArith(const Function &F,
EVT PtrVT) const {
- return UseSelectionDAGPTRADD && PtrVT == MVT::i64;
+ return PtrVT == MVT::i64;
}
bool SITargetLowering::canTransformPtrArithOutOfBounds(const Function &F,
diff --git a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
index 2c03113e8af47..805cdd37d6e70 100644
--- a/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
+++ b/llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
@@ -6,96 +6,150 @@ define void @main(i1 %arg) #0 {
; CHECK: ; %bb.0: ; %bb
; CHECK-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; CHECK-NEXT:s_xor_saveexec_b64 s[4:5], -1
-; CHECK-NEXT:buffer_store_dword v5, off, s[0:3], s32 ; 4-byte Folded Spill
-; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 offset:4 ; 4-byte
Folded Spill
+; CHECK-NEXT:buffer_store_dword v6, off, s[0:3], s32 ; 4-byte Folded Spill
+; CHECK-NEXT:buffer_store_dword v7, off, s[0:3], s32 offset:4 ; 4-byte
Folded Spill
; CHECK-NEXT:s_mov_b64 exec, s[4:5]
-; CHECK-NEXT:v_writelane_b32 v5, s30, 0
-; CHECK-NEXT:v_writelane_b32 v5, s31, 1
-; CHECK-NEXT:v_writelane_b32 v5, s36, 2
-; CHECK-NEXT:v_writelane_b32 v5, s37, 3
-; CHECK-NEXT:v_writelane_b32 v5, s38, 4
-; CHECK-NEXT:v_writelane_b32 v5, s39, 5
-; CHECK-NEXT:v_writelane_b32 v5, s48, 6
-; CHECK-NEXT:v_writelane_b32 v5, s49, 7
-; CHECK-NEXT:v_writelane_b32 v5, s50, 8
-; CHECK-NEXT:v_writelane_b32 v5, s51, 9
-; CHECK-NEXT:v_writelane_b32 v5, s52, 10
-; CHECK-NEXT:v_writelane_b32 v5, s53, 11
-; CHECK-NEXT:v_writelane_b32 v5, s54, 12
-; CHECK-NEXT:v_writelane_b32 v5, s55, 13
-; CHECK-NEXT:s_getpc_b64 s[24:25]
-; CHECK-NEXT:v_writelane_b32 v5, s64, 14
-; CHECK-NEXT:s_movk_i32 s4, 0xf0
-; CHECK-NEXT:s_mov_b32 s5, s24
-; CHECK-NEXT:v_writelane_b32 v5, s65, 15
-; CHECK-NEXT:s_load_dwordx16 s[8:23], s[4:5], 0x0
-; CHECK-NEXT:s_mov_b64 s[4:5], 0
-; CHECK-NEXT:v_writelane_b32 v5, s66, 16
-; CHECK-NEXT:s_load_dwordx4 s[4:7], s[4:5], 0x0
-; CHECK-NEXT:v_writelane_b32 v5, s67, 17
-; CHECK-NEXT:s_waitcnt lgkmcnt(0)
-; CHECK-NEXT:s_movk_i32 s6, 0x130
-; CHECK-NEXT:s_mov_b32 s7, s24
-; CHECK-NEXT:v_writelane_b32 v5
[llvm-branch-commits] [mlir] [MLIR][Standalone] test Standalone against install distributions (PR #157944)
makslevental wrote: > The fact that Subprocess 1 CMake and Parent CMake have identical build > directories seems particularly problematic. My assumption was (and I realize that it's flawed now) that process 1 isn't actually building anything, just "installing" artifacts. But of course we've all seen things get built when doing `ninja install` even after doing a full build. > Sounds like we're aligned that this Standalone/LIT location isn't the right > place, @makslevental ? I mean I don't care how the thing is tested right - I was just going for what I thought was the least controversial approach. IF there's an even less controversial approach I'm happy to do that instead! BTW I guess this is what @boomanaiden154 was talking about https://github.com/llvm/llvm-project/blob/e236a52a88956968f318fb908c584e5cb80b5b03/libcxx/test/CMakeLists.txt#L40-L58 which I can try out if we _do_ want it to be a lit test (but again happy not to have to do that). https://github.com/llvm/llvm-project/pull/157944 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] BitstreamRemarkParser: Refactor error handling (PR #156511)
jroelofs wrote: > We could delete bytes from valid files to trigger the errors and commit them > as binary blobs. Committing more binary blobs is not a great idea given the `xz` debacle. It would be better to have them set up as tests that serialize something from yaml to bitstream, and then corrupt some bytes and check that the error triggers. https://github.com/llvm/llvm-project/pull/156511 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [Remarks] BitstreamRemarkParser: Refactor error handling (PR #156511)
@@ -13,81 +13,171 @@
#ifndef LLVM_LIB_REMARKS_BITSTREAM_REMARK_PARSER_H
#define LLVM_LIB_REMARKS_BITSTREAM_REMARK_PARSER_H
-#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Bitstream/BitstreamReader.h"
#include "llvm/Remarks/BitstreamRemarkContainer.h"
+#include "llvm/Remarks/Remark.h"
#include "llvm/Remarks/RemarkFormat.h"
#include "llvm/Remarks/RemarkParser.h"
+#include "llvm/Remarks/RemarkStringTable.h"
#include "llvm/Support/Error.h"
-#include
+#include "llvm/Support/FormatVariadic.h"
#include
#include
#include
namespace llvm {
namespace remarks {
-struct Remark;
+class BitstreamBlockParserHelperBase {
+protected:
+ BitstreamCursor &Stream;
+
+ unsigned BlockID;
+ StringRef BlockName;
jroelofs wrote:
My intuition says this will have better struct layout, though I haven't checked:
```suggestion
StringRef BlockName;
unsigned BlockID;
```
https://github.com/llvm/llvm-project/pull/156511
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [RISCV] Support PreserveMost calling convention (#148214) (PR #158403)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/158403 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add `AAAMDGPUClusterDims` (PR #158076)
@@ -1296,6 +1303,157 @@ struct AAAMDGPUNoAGPR
const char AAAMDGPUNoAGPR::ID = 0;
+/// An abstract attribute to propagate the function attribute
+/// "amdgpu-cluster-dims" from kernel entry functions to device functions.
+struct AAAMDGPUClusterDims
+: public StateWrapper {
+ using Base = StateWrapper;
+ AAAMDGPUClusterDims(const IRPosition &IRP, Attributor &A) : Base(IRP) {}
+
+ /// Create an abstract attribute view for the position \p IRP.
+ static AAAMDGPUClusterDims &createForPosition(const IRPosition &IRP,
+Attributor &A);
+
+ /// See AbstractAttribute::getName().
+ StringRef getName() const override { return "AAAMDGPUClusterDims"; }
+
+ /// See AbstractAttribute::getIdAddr().
+ const char *getIdAddr() const override { return &ID; }
+
+ /// This function should return true if the type of the \p AA is
+ /// AAAMDGPUClusterDims.
+ static bool classof(const AbstractAttribute *AA) {
+return (AA->getIdAddr() == &ID);
+ }
+
+ virtual const AMDGPU::ClusterDimsAttr &getClusterDims() const = 0;
+
+ /// Unique ID (due to the unique address)
+ static const char ID;
+};
+
+const char AAAMDGPUClusterDims::ID = 0;
+
+struct AAAMDGPUClusterDimsFunction : public AAAMDGPUClusterDims {
+ AAAMDGPUClusterDimsFunction(const IRPosition &IRP, Attributor &A)
+ : AAAMDGPUClusterDims(IRP, A) {}
+
+ void initialize(Attributor &A) override {
+Function *F = getAssociatedFunction();
+assert(F && "empty associated function");
+
+Attr = AMDGPU::ClusterDimsAttr::get(*F);
+
+// No matter what a kernel function has, it is final.
+if (AMDGPU::isEntryFunctionCC(F->getCallingConv())) {
+ if (Attr.isUnknown())
+indicatePessimisticFixpoint();
+ else
+indicateOptimisticFixpoint();
+}
+ }
+
+ const std::string getAsStr(Attributor *A) const override {
+if (!getAssumed() || Attr.isUnknown())
+ return "unknown";
+if (Attr.isNoCluster())
+ return "no";
+if (Attr.isVariableedDims())
shiltian wrote:
oh that's bad. will do.
https://github.com/llvm/llvm-project/pull/158076
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] SPARC: Use RegClassByHwMode instead of PointerLikeRegClass (PR #158271)
@@ -95,10 +95,27 @@ def HasFSMULD : Predicate<"!Subtarget->hasNoFSMULD()">; // will pick deprecated instructions. def UseDeprecatedInsts : Predicate<"Subtarget->useV8DeprecatedInsts()">; +//===--===// +// HwModes Pattern Stuff +//===--===// + +defvar SPARC32 = DefaultMode; +def SPARC64 : HwMode<[Is64Bit]>; s-barannikov wrote: I meant default mode in hardware. This is more of a stylistic suggestion. https://github.com/llvm/llvm-project/pull/158271 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `local` on device (PR #157638)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/157638
>From 983b97d91cbf9dbf45973fdacabf2ae6948491a4 Mon Sep 17 00:00:00 2001
From: ergawy
Date: Tue, 2 Sep 2025 05:54:00 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `local` on device
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.
---
.../include/flang/Optimizer/Dialect/FIROps.td | 12 ++
.../OpenMP/DoConcurrentConversion.cpp | 192 +++---
.../Transforms/DoConcurrent/local_device.mlir | 49 +
3 files changed, 175 insertions(+), 78 deletions(-)
create mode 100644 flang/test/Transforms/DoConcurrent/local_device.mlir
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td
b/flang/include/flang/Optimizer/Dialect/FIROps.td
index bc971e8fd6600..fc6eedc6ed4c6 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3894,6 +3894,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
return getReduceVars().size();
}
+unsigned getInductionVarsStart() {
+ return 0;
+}
+
+unsigned getLocalOperandsStart() {
+ return getNumInductionVars();
+}
+
+unsigned getReduceOperandsStart() {
+ return getLocalOperandsStart() + getNumLocalOperands();
+}
+
mlir::Block::BlockArgListType getInductionVars() {
return getBody()->getArguments().slice(0, getNumInductionVars());
}
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index 6c71924000842..d00a4fdd2cf2e 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -138,6 +138,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
liveIns.push_back(operand->get());
});
+
+ for (mlir::Value local : loop.getLocalVars())
+liveIns.push_back(local);
}
/// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -298,8 +301,7 @@ class DoConcurrentConversion
.getIsTargetDevice();
mlir::omp::TargetOperands targetClauseOps;
- genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
- loopNestClauseOps,
+ genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps,
isTargetDevice ? nullptr : &targetClauseOps);
LiveInShapeInfoMap liveInShapeInfoMap;
@@ -321,14 +323,13 @@ class DoConcurrentConversion
}
mlir::omp::ParallelOp parallelOp =
-genParallelOp(doLoop.getLoc(), rewriter, ivInfos, mapper);
+genParallelOp(rewriter, loop, ivInfos, mapper);
// Only set as composite when part of `distribute parallel do`.
parallelOp.setComposite(mapToDevice);
if (!mapToDevice)
- genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, mapper,
- loopNestClauseOps);
+ genLoopNestClauseOps(doLoop.getLoc(), rewriter, loop, loopNestClauseOps);
for (mlir::Value local : locals)
looputils::localizeLoopLocalValue(local, parallelOp.getRegion(),
@@ -337,10 +338,38 @@ class DoConcurrentConversion
if (mapToDevice)
genDistributeOp(doLoop.getLoc(), rewriter).setComposite(/*val=*/true);
-mlir::omp::LoopNestOp ompLoopNest =
+auto [loopNestOp, wsLoopOp] =
genWsLoopOp(rewriter, loop, mapper, loopNestClauseOps,
/*isComposite=*/mapToDevice);
+// `local` region arguments are transferred/cloned from the `do concurrent`
+// loop to the loopnest op when the region is cloned above. Instead, these
+// region arguments should be on the workshare loop's region.
+if (mapToDevice) {
+ for (auto [parallelArg, loopNestArg] : llvm::zip_equal(
+ parallelOp.getRegion().getArguments(),
+ loopNestOp.getRegion().getArguments().slice(
+ loop.getLocalOperandsStart(), loop.getNumLocalOperands(
+rewriter.replaceAllUsesWith(loopNestArg, parallelArg);
+
+ for (auto [wsloopArg, loopNestArg] : llvm::zip_equal(
+ wsLoopOp.getRegion().getArguments(),
+ loopNestOp.getRegion().getArguments().slice(
+ loop.getReduceOperandsStart(),
loop.getNumReduceOperands(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+} else {
+ for (auto [wsloopArg, loopNestArg] :
+ llvm::zip_equal(wsLoopOp.getRegion().getArguments(),
+ loopNestOp.getRegion().getArguments().drop_front(
+ loopNestClauseOps.loopLowerBounds.size(
+rewriter.replaceAllUsesWith(loopNestArg, wsloopArg);
+}
+
+for (unsigned i = 0;
+ i
[llvm-branch-commits] [llvm] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD (PR #157845)
https://github.com/petar-avramovic updated
https://github.com/llvm/llvm-project/pull/157845
>From f426257364826fbec65abb6de92698bfa18f9487 Mon Sep 17 00:00:00 2001
From: Petar Avramovic
Date: Wed, 10 Sep 2025 13:04:20 +0200
Subject: [PATCH] AMDGPU/UniformityAnalysis: fix G_ZEXTLOAD and G_SEXTLOAD
Use same rules for G_ZEXTLOAD and G_SEXTLOAD as for G_LOAD.
Flat addrspace(0) and private addrspace(5) G_ZEXTLOAD and G_SEXTLOAD
should be always divergent.
---
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp| 15 +++---
.../AMDGPU/MIR/loads-gmir.mir | 20 +++
2 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
index 5c958dfe6954f..398c99b3bd127 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
@@ -10281,7 +10281,7 @@ unsigned SIInstrInfo::getInstrLatency(const
InstrItineraryData *ItinData,
InstructionUniformity
SIInstrInfo::getGenericInstructionUniformity(const MachineInstr &MI) const {
const MachineRegisterInfo &MRI = MI.getMF()->getRegInfo();
- unsigned opcode = MI.getOpcode();
+ unsigned Opcode = MI.getOpcode();
auto HandleAddrSpaceCast = [this, &MRI](const MachineInstr &MI) {
Register Dst = MI.getOperand(0).getReg();
@@ -10301,7 +10301,7 @@ SIInstrInfo::getGenericInstructionUniformity(const
MachineInstr &MI) const {
// If the target supports globally addressable scratch, the mapping from
// scratch memory to the flat aperture changes therefore an address space
cast
// is no longer uniform.
- if (opcode == TargetOpcode::G_ADDRSPACE_CAST)
+ if (Opcode == TargetOpcode::G_ADDRSPACE_CAST)
return HandleAddrSpaceCast(MI);
if (auto *GI = dyn_cast(&MI)) {
@@ -10329,7 +10329,8 @@ SIInstrInfo::getGenericInstructionUniformity(const
MachineInstr &MI) const {
//
// All other loads are not divergent, because if threads issue loads with the
// same arguments, they will always get the same result.
- if (opcode == AMDGPU::G_LOAD) {
+ if (Opcode == AMDGPU::G_LOAD || Opcode == AMDGPU::G_ZEXTLOAD ||
+ Opcode == AMDGPU::G_SEXTLOAD) {
if (MI.memoperands_empty())
return InstructionUniformity::NeverUniform; // conservative assumption
@@ -10343,10 +10344,10 @@ SIInstrInfo::getGenericInstructionUniformity(const
MachineInstr &MI) const {
return InstructionUniformity::Default;
}
- if (SIInstrInfo::isGenericAtomicRMWOpcode(opcode) ||
- opcode == AMDGPU::G_ATOMIC_CMPXCHG ||
- opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS ||
- AMDGPU::isGenericAtomic(opcode)) {
+ if (SIInstrInfo::isGenericAtomicRMWOpcode(Opcode) ||
+ Opcode == AMDGPU::G_ATOMIC_CMPXCHG ||
+ Opcode == AMDGPU::G_ATOMIC_CMPXCHG_WITH_SUCCESS ||
+ AMDGPU::isGenericAtomic(Opcode)) {
return InstructionUniformity::NeverUniform;
}
return InstructionUniformity::Default;
diff --git a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
index cb3c2de5b8753..d799cd2057f47 100644
--- a/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
+++ b/llvm/test/Analysis/UniformityAnalysis/AMDGPU/MIR/loads-gmir.mir
@@ -46,13 +46,13 @@ body: |
%6:_(p5) = G_IMPLICIT_DEF
; Atomic load
-; CHECK-NOT: DIVERGENT
-
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
%0:_(s32) = G_ZEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`)
; flat load
-; CHECK-NOT: DIVERGENT
-
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
%2:_(s32) = G_ZEXTLOAD %1(p0) :: (load (s16) from `ptr undef`)
; Gloabal load
@@ -60,7 +60,8 @@ body: |
%3:_(s32) = G_ZEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1)
undef`, addrspace 1)
; Private load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_ZEXTLOAD
%5:_(s32) = G_ZEXTLOAD %6(p5) :: (volatile load (s16) from `ptr
addrspace(5) undef`, addrspace 5)
G_STORE %2(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1)
undef`, addrspace 1)
G_STORE %3(s32), %4(p1) :: (volatile store (s32) into `ptr addrspace(1)
undef`, addrspace 1)
@@ -80,11 +81,13 @@ body: |
%6:_(p5) = G_IMPLICIT_DEF
; Atomic load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
%0:_(s32) = G_SEXTLOAD %1(p0) :: (load seq_cst (s16) from `ptr undef`)
; flat load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
%2:_(s32) = G_SEXTLOAD %1(p0) :: (load (s16) from `ptr undef`)
; Gloabal load
@@ -92,7 +95,8 @@ body: |
%3:_(s32) = G_SEXTLOAD %4(p1) :: (load (s16) from `ptr addrspace(1)
undef`, addrspace 1)
; Private load
-; CHECK-NOT: DIVERGENT
+; CHECK: DIVERGENT
+; CHECK-SAME: G_SEXTLOAD
%5:_(s32) = G_SEXTLOAD %6(p5) :: (volatile load (s16)
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/156837
>From 7f6d6feb526c33b05e9705ef6587e8bcc145458f Mon Sep 17 00:00:00 2001
From: ergawy
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH 1/2] [flang][OpenMP] Support multi-block reduction combiner
regions on the GPU
Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 +
.../omptarget-multi-block-reduction.mlir | 85 +++
2 files changed, 88 insertions(+)
create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index c955ecd403633..116d1d9f4a951 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3507,6 +3507,8 @@ Expected
OpenMPIRBuilder::createReductionFunction(
return AfterIP.takeError();
if (!Builder.GetInsertBlock())
return ReductionFunc;
+
+ Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
Builder.CreateStore(Reduced, LHSPtr);
}
}
@@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createReductionsGPU(
RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
if (!AfterIP)
return AfterIP.takeError();
+ Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
Builder.CreateStore(Reduced, LHS, false);
}
}
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 0..aaf06d2d0e0c2
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple =
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+ llvm.func @bar() {}
+ llvm.func @baz() {}
+
+ omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+ } init {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+ } combiner {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+ ^bb3: // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+ }
+ llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) ->
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>)
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+ %c1_2 = llvm.mlir.constant(1 : i32) : i32
+ %c10 = llvm.mlir.constant(10 : i32) : i32
+ omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2
: !llvm.ptr) {
+omp.parallel {
+ omp.distribute {
+omp.wsloop {
+ omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step
(%c1_2) {
+omp.yield
+ }
+} {omp.composite}
+ } {omp.composite}
+ omp.terminator
+} {omp.composite}
+omp.terminator
+ }
+ omp.terminator
+}
+llvm.return
+ }
+}
+
+// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK: .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK: [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK: [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK: [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK: call void @b
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/156837
>From 7f6d6feb526c33b05e9705ef6587e8bcc145458f Mon Sep 17 00:00:00 2001
From: ergawy
Date: Thu, 4 Sep 2025 01:06:21 -0500
Subject: [PATCH] [flang][OpenMP] Support multi-block reduction combiner
regions on the GPU
Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.
---
llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp | 3 +
.../omptarget-multi-block-reduction.mlir | 85 +++
2 files changed, 88 insertions(+)
create mode 100644 mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index c955ecd403633..116d1d9f4a951 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -3507,6 +3507,8 @@ Expected
OpenMPIRBuilder::createReductionFunction(
return AfterIP.takeError();
if (!Builder.GetInsertBlock())
return ReductionFunc;
+
+ Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
Builder.CreateStore(Reduced, LHSPtr);
}
}
@@ -3751,6 +3753,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy
OpenMPIRBuilder::createReductionsGPU(
RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced);
if (!AfterIP)
return AfterIP.takeError();
+ Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint());
Builder.CreateStore(Reduced, LHS, false);
}
}
diff --git a/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
new file mode 100644
index 0..aaf06d2d0e0c2
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/omptarget-multi-block-reduction.mlir
@@ -0,0 +1,85 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// Verifies that the IR builder can handle reductions with multi-block combiner
+// regions on the GPU.
+
+module attributes {dlti.dl_spec = #dlti.dl_spec<"dlti.alloca_memory_space" = 5
: ui64, "dlti.global_memory_space" = 1 : ui64>, llvm.target_triple =
"amdgcn-amd-amdhsa", omp.is_gpu = true, omp.is_target_device = true} {
+ llvm.func @bar() {}
+ llvm.func @baz() {}
+
+ omp.declare_reduction @add_reduction_byref_box_5xf32 : !llvm.ptr alloc {
+%0 = llvm.mlir.constant(1 : i64) : i64
+%1 = llvm.alloca %0 x !llvm.struct<(ptr, i64, i32, i8, i8, i8, i8, array<1
x array<3 x i64>>)> : (i64) -> !llvm.ptr<5>
+%2 = llvm.addrspacecast %1 : !llvm.ptr<5> to !llvm.ptr
+omp.yield(%2 : !llvm.ptr)
+ } init {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+omp.yield(%arg1 : !llvm.ptr)
+ } combiner {
+ ^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+llvm.call @bar() : () -> ()
+llvm.br ^bb3
+
+ ^bb3: // pred: ^bb1
+llvm.call @baz() : () -> ()
+omp.yield(%arg0 : !llvm.ptr)
+ }
+ llvm.func @foo_() {
+%c1 = llvm.mlir.constant(1 : i64) : i64
+%10 = llvm.alloca %c1 x !llvm.array<5 x f32> {bindc_name = "x"} : (i64) ->
!llvm.ptr<5>
+%11 = llvm.addrspacecast %10 : !llvm.ptr<5> to !llvm.ptr
+%74 = omp.map.info var_ptr(%11 : !llvm.ptr, !llvm.array<5 x f32>)
map_clauses(tofrom) capture(ByRef) -> !llvm.ptr {name = "x"}
+omp.target map_entries(%74 -> %arg0 : !llvm.ptr) {
+ %c1_2 = llvm.mlir.constant(1 : i32) : i32
+ %c10 = llvm.mlir.constant(10 : i32) : i32
+ omp.teams reduction(byref @add_reduction_byref_box_5xf32 %arg0 -> %arg2
: !llvm.ptr) {
+omp.parallel {
+ omp.distribute {
+omp.wsloop {
+ omp.loop_nest (%arg5) : i32 = (%c1_2) to (%c10) inclusive step
(%c1_2) {
+omp.yield
+ }
+} {omp.composite}
+ } {omp.composite}
+ omp.terminator
+} {omp.composite}
+omp.terminator
+ }
+ omp.terminator
+}
+llvm.return
+ }
+}
+
+// CHECK: call void @__kmpc_parallel_51({{.*}}, i32 1, i32 -1, i32 -1,
+// CHECK-SAME: ptr @[[PAR_OUTLINED:.*]], ptr null, ptr %2, i64 1)
+
+// CHECK: define internal void @[[PAR_OUTLINED]]{{.*}} {
+// CHECK: .omp.reduction.then:
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
+// CHECK: br label %[[BODY_2ND_BB:.*]]
+
+// CHECK: [[BODY_2ND_BB]]:
+// CHECK: call void @baz()
+// CHECK: br label %[[CONT_BB:.*]]
+
+// CHECK: [[CONT_BB]]:
+// CHECK: br label %.omp.reduction.done
+// CHECK: }
+
+// CHECK: define internal void @"{{.*}}$reduction$reduction_func"(ptr noundef
%0, ptr noundef %1) #0 {
+// CHECK: br label %omp.reduction.nonatomic.body
+
+// CHECK: [[BODY_2ND_BB:.*]]:
+// CHECK: call void @baz()
+// CHECK: br label %omp.region.cont
+
+
+// CHECK: omp.reduction.nonatomic.body:
+// CHECK: call void @bar()
[llvm-branch-commits] [llvm] [NFC][flang][do concurent] Add saxpy offload tests for OpenMP mapping (PR #155993)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/155993
>From 2177ccc20d333d6c6645f96a2b9c427d4ea952ac Mon Sep 17 00:00:00 2001
From: ergawy
Date: Fri, 29 Aug 2025 04:04:07 -0500
Subject: [PATCH] [flang][do concurent] Add saxpy offload tests for OpenMP
mapping
Adds end-to-end tests for `do concurrent` offloading to the device.
---
.../fortran/do-concurrent-to-omp-saxpy-2d.f90 | 53 +++
.../fortran/do-concurrent-to-omp-saxpy.f90| 53 +++
2 files changed, 106 insertions(+)
create mode 100644
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
create mode 100644
offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 |
%fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n, m
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:,:),intent(in) :: x
+ real(kind=real32), dimension(:,:),intent(inout) :: y
+ integer :: i, j
+
+ do concurrent(i=1:n, j=1:m)
+ y(i,j) = a * x(i,j) + y(i,j)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1,1) ",f8.6)') y(1,1)
+ write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000, m=1
+ real(kind=real32), allocatable, dimension(:,:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n,1:m), y(1:n,1:m))
+ a = 2.0_real32
+ x(:,:) = 1.0_real32
+ y(:,:) = 2.0_real32
+
+ call saxpy(a, x, y, n, m)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1,1) 4.0
+! CHECK: y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 |
%fcheck-generic
+module saxpymod
+ use iso_fortran_env
+ public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+ use iso_fortran_env
+ implicit none
+ integer,intent(in) :: n
+ real(kind=real32),intent(in) :: a
+ real(kind=real32), dimension(:),intent(in) :: x
+ real(kind=real32), dimension(:),intent(inout) :: y
+ integer :: i
+
+ do concurrent(i=1:n)
+ y(i) = a * x(i) + y(i)
+ end do
+
+ write(*,*) "plausibility check:"
+ write(*,'("y(1) ",f8.6)') y(1)
+ write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+ use iso_fortran_env
+ use saxpymod, ONLY:saxpy
+ implicit none
+
+ integer,parameter :: n = 1000
+ real(kind=real32), allocatable, dimension(:) :: x, y
+ real(kind=real32) :: a
+ integer :: i
+
+ allocate(x(1:n), y(1:n))
+ a = 2.0_real32
+ x(:) = 1.0_real32
+ y(:) = 2.0_real32
+
+ call saxpy(a, x, y, n)
+
+ deallocate(x,y)
+end program main
+
+! CHECK: "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK: plausibility check:
+! CHECK: y(1) 4.0
+! CHECK: y(n) 4.0
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [flang] [flang][OpenMP] `do concurrent`: support `reduce` on device (PR #156610)
https://github.com/ergawy updated
https://github.com/llvm/llvm-project/pull/156610
>From 6dbc28572976b06ae8ad6661724c5cbfd7aab2e9 Mon Sep 17 00:00:00 2001
From: ergawy
Date: Tue, 2 Sep 2025 08:36:34 -0500
Subject: [PATCH] [flang][OpenMP] `do concurrent`: support `reduce` on device
Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.
---
.../OpenMP/DoConcurrentConversion.cpp | 117 ++
.../DoConcurrent/reduce_device.mlir | 53
2 files changed, 121 insertions(+), 49 deletions(-)
create mode 100644 flang/test/Transforms/DoConcurrent/reduce_device.mlir
diff --git a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
index d00a4fdd2cf2e..6e308499100fa 100644
--- a/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
+++ b/flang/lib/Optimizer/OpenMP/DoConcurrentConversion.cpp
@@ -141,6 +141,9 @@ void collectLoopLiveIns(fir::DoConcurrentLoopOp loop,
for (mlir::Value local : loop.getLocalVars())
liveIns.push_back(local);
+
+ for (mlir::Value reduce : loop.getReduceVars())
+liveIns.push_back(reduce);
}
/// Collects values that are local to a loop: "loop-local values". A loop-local
@@ -319,7 +322,7 @@ class DoConcurrentConversion
targetOp =
genTargetOp(doLoop.getLoc(), rewriter, mapper, loopNestLiveIns,
targetClauseOps, loopNestClauseOps, liveInShapeInfoMap);
- genTeamsOp(doLoop.getLoc(), rewriter);
+ genTeamsOp(rewriter, loop, mapper);
}
mlir::omp::ParallelOp parallelOp =
@@ -492,46 +495,7 @@ class DoConcurrentConversion
if (!mapToDevice)
genPrivatizers(rewriter, mapper, loop, wsloopClauseOps);
-if (!loop.getReduceVars().empty()) {
- for (auto [op, byRef, sym, arg] : llvm::zip_equal(
- loop.getReduceVars(), loop.getReduceByrefAttr().asArrayRef(),
- loop.getReduceSymsAttr().getAsRange(),
- loop.getRegionReduceArgs())) {
-auto firReducer = moduleSymbolTable.lookup(
-sym.getLeafReference());
-
-mlir::OpBuilder::InsertionGuard guard(rewriter);
-rewriter.setInsertionPointAfter(firReducer);
-std::string ompReducerName = sym.getLeafReference().str() + ".omp";
-
-auto ompReducer =
-moduleSymbolTable.lookup(
-rewriter.getStringAttr(ompReducerName));
-
-if (!ompReducer) {
- ompReducer = mlir::omp::DeclareReductionOp::create(
- rewriter, firReducer.getLoc(), ompReducerName,
- firReducer.getTypeAttr().getValue());
-
- cloneFIRRegionToOMP(rewriter, firReducer.getAllocRegion(),
- ompReducer.getAllocRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getInitializerRegion(),
- ompReducer.getInitializerRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getReductionRegion(),
- ompReducer.getReductionRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getAtomicReductionRegion(),
- ompReducer.getAtomicReductionRegion());
- cloneFIRRegionToOMP(rewriter, firReducer.getCleanupRegion(),
- ompReducer.getCleanupRegion());
- moduleSymbolTable.insert(ompReducer);
-}
-
-wsloopClauseOps.reductionVars.push_back(op);
-wsloopClauseOps.reductionByref.push_back(byRef);
-wsloopClauseOps.reductionSyms.push_back(
-mlir::SymbolRefAttr::get(ompReducer));
- }
-}
+genReductions(rewriter, mapper, loop, wsloopClauseOps);
auto wsloopOp =
mlir::omp::WsloopOp::create(rewriter, loop.getLoc(), wsloopClauseOps);
@@ -553,8 +517,6 @@ class DoConcurrentConversion
rewriter.setInsertionPointToEnd(&loopNestOp.getRegion().back());
mlir::omp::YieldOp::create(rewriter, loop->getLoc());
-loop->getParentOfType().print(
-llvm::errs(), mlir::OpPrintingFlags().assumeVerified());
return {loopNestOp, wsloopOp};
}
@@ -778,15 +740,26 @@ class DoConcurrentConversion
liveInName, shape);
}
- mlir::omp::TeamsOp
- genTeamsOp(mlir::Location loc,
- mlir::ConversionPatternRewriter &rewriter) const {
-auto teamsOp = rewriter.create(
-loc, /*clauses=*/mlir::omp::TeamsOperands{});
+ mlir::omp::TeamsOp genTeamsOp(mlir::ConversionPatternRewriter &rewriter,
+fir::DoConcurrentLoopOp loop,
+mlir::IRMapping &mapper) const {
+mlir::omp::TeamsOperands teamsOps;
+genReductions(rewriter, mapper, loop, teamsOps);
+
+mlir::Location loc = loop.getLoc();
+aut
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
@@ -3506,6 +3506,8 @@ Expected OpenMPIRBuilder::createReductionFunction( return AfterIP.takeError(); if (!Builder.GetInsertBlock()) return ReductionFunc; + + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); ergawy wrote: Done. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
@@ -3750,6 +3752,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createReductionsGPU( RI.ReductionGen(Builder.saveIP(), RHSValue, LHSValue, Reduced); if (!AfterIP) return AfterIP.takeError(); + Builder.SetInsertPoint(AfterIP->getBlock(), AfterIP->getPoint()); ergawy wrote: Done. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] VocabStorage (PR #158376)
svkeerthy wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/158376?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#158376** https://app.graphite.dev/github/pr/llvm/llvm-project/158376?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/158376?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#156952** https://app.graphite.dev/github/pr/llvm/llvm-project/156952?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#155690** https://app.graphite.dev/github/pr/llvm/llvm-project/155690?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#155516** https://app.graphite.dev/github/pr/llvm/llvm-project/155516?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#155323** https://app.graphite.dev/github/pr/llvm/llvm-project/155323?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>: 1 other dependent PR ([#155700](https://github.com/llvm/llvm-project/pull/155700) https://app.graphite.dev/github/pr/llvm/llvm-project/155700?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/>) * **#153094** https://app.graphite.dev/github/pr/llvm/llvm-project/153094?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#153089** https://app.graphite.dev/github/pr/llvm/llvm-project/153089?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#153087** https://app.graphite.dev/github/pr/llvm/llvm-project/153087?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#152613** https://app.graphite.dev/github/pr/llvm/llvm-project/152613?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/158376 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [NFC][CFI][CodeGen] Move GeneralizeFunctionType out of CreateMetadataIdentifierGeneralized (PR #158190)
https://github.com/vitalybuka updated https://github.com/llvm/llvm-project/pull/158190 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [mlir] [flang][OpenMP] Support multi-block reduction combiner regions on the GPU (PR #156837)
ergawy wrote: Thanks for the review Abid. Addressed your comments. https://github.com/llvm/llvm-project/pull/156837 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
