[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
https://github.com/vikramRH created
https://github.com/llvm/llvm-project/pull/168832
None
>From 49e4825b231eae88f7aea3184e1c8ca904abb674 Mon Sep 17 00:00:00 2001
From: vikhegde
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
RegUsageInfoCollectorPass
---
llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll | 6 +++---
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
derived().addPreEmitPass(addPass);
- if (TM.Options.EnableIPRA)
+ if (TM.Options.EnableIPRA) {
// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
addPass(RegUsageInfoCollectorPass());
+ }
addPass(FuncletLayoutPass());
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
; RUN: | FileCheck -check-prefix=GCN-O3 %s
-; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
-; GCN-O2:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,am
[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Vikram Hegde (vikramRH)
Changes
RegUsageInfoCollectorPass requires PhysicalRegisterUsageAnalysis to be valid.
this is required since its a module analysis.
---
Full diff: https://github.com/llvm/llvm-project/pull/168832.diff
2 Files Affected:
- (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+3-1)
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+3-3)
``diff
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
derived().addPreEmitPass(addPass);
- if (TM.Options.EnableIPRA)
+ if (TM.Options.EnableIPRA) {
// Collect register usage information and produce a register mask of
// clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
addPass(RegUsageInfoCollectorPass());
+ }
addPass(FuncletLayoutPass());
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
; RUN: | FileCheck -check-prefix=GCN-O3 %s
-; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
-; GCN-O2:
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor
[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
https://github.com/vikramRH ready_for_review https://github.com/llvm/llvm-project/pull/168832 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)
https://github.com/vikramRH edited https://github.com/llvm/llvm-project/pull/168832 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
+ auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+ if (Probes.empty())
+return nullptr;
+ const MCDecodedPseudoProbe &Probe = *Probes.begin();
+ const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+ while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+ return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+ using namespace yaml::bolt;
+ BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+ const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+ using ChildMapTy =
+ std::unordered_map;
+ using CallSiteInfoTy =
+ std::unordered_map;
+ // Mapping from a parent node id to a map InlineSite -> Child node.
+ DenseMap ParentToChildren;
+ // Collect calls in the profile: map from a parent node id to a map
+ // InlineSite -> CallSiteInfo ptr.
+ DenseMap ParentToCSI;
+ for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+ ProbeMatchingStats.TotalCallCount += CSI.Count;
+ ++ProbeMatchingStats.TotalCallSites;
+ if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+ if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
maksfb wrote:
```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no callee for " << CSI.DestId << " "
<< CSI.Count
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
github-actions[bot] wrote:
:warning: C/C++ code formatter, clang-format found issues in your code.
:warning:
You can test this locally with the following command:
``bash
git-clang-format --diff origin/main HEAD --extensions cpp --
compiler-rt/test/asan/TestCases/Darwin/atos-symbolizer-dyld-root-path.cpp
compiler-rt/test/asan/TestCases/Darwin/atos-symbolizer.cpp
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_remove.cpp
compiler-rt/test/asan/TestCases/Darwin/init_for_dlopen.cpp
compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
--diff_from_common_commit
``
:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:
View the diff from clang-format here.
``diff
diff --git a/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
b/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
index 09502e3aa..ac3c5898f 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
@@ -5,7 +5,6 @@
// RUN: %clangxx_asan %s -o %t
// RUN: env ASAN_OPTIONS="abort_on_error=1" not --crash %run %t 2>&1 |
FileCheck %s
-
void *pwn(malloc_zone_t *unused_zone, size_t unused_size) {
printf("PWNED\n");
return NULL;
diff --git
a/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
b/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
index 66f7e06a3..7cb1dcc51 100644
--- a/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
+++ b/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
@@ -24,7 +24,7 @@
// RUN: diff %t.imports-sorted %t.exports-sorted
// Ensure that there is no dynamic dylib linked.
-// RUN: otool -L %t > %t.libs
+// RUN: otool -L %t > %t.libs
// not grep -q "dynamic.dylib" < %t.libs
// UNSUPPORTED: ios
``
https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)
@@ -51,6 +51,7 @@ enum {
// https://www.ibm.com/docs/en/hla-and-tf/1.6?topic=value-address-constants
S_RCon, // Address of ADA of symbol.
S_VCon, // Address of external function symbol.
+ S_QCon, // Class-based offset.
redstar wrote:
Yes, there is no user yet. It is requires for the ctor/dtor lists. I think
about a test until then...
https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [lld] release/21.x: [LLD][COFF] Align EC code ranges to page boundaries (#168222) (PR #168369)
llvmbot wrote:
@llvm/pr-subscribers-lld-coff
Author: None (llvmbot)
Changes
Backport af45b0202cdd443beedb02392f653d8cff5bd931
Requested by: @cjacek
---
Full diff: https://github.com/llvm/llvm-project/pull/168369.diff
2 Files Affected:
- (modified) lld/COFF/Chunks.cpp (+1-1)
- (modified) lld/test/COFF/arm64ec-codemap.test (+33-3)
``diff
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index 01752cdc6a9da..cfb33daa024a7 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -939,7 +939,7 @@ void ECCodeMapChunk::writeTo(uint8_t *buf) const {
auto table = reinterpret_cast(buf);
for (uint32_t i = 0; i < map.size(); i++) {
const ECCodeMapEntry &entry = map[i];
-uint32_t start = entry.first->getRVA();
+uint32_t start = entry.first->getRVA() & ~0xfff;
table[i].StartOffset = start | entry.type;
table[i].Length = entry.last->getRVA() + entry.last->getSize() - start;
}
diff --git a/lld/test/COFF/arm64ec-codemap.test
b/lld/test/COFF/arm64ec-codemap.test
index 050261117be2e..bbc682d19920f 100644
--- a/lld/test/COFF/arm64ec-codemap.test
+++ b/lld/test/COFF/arm64ec-codemap.test
@@ -7,6 +7,7 @@ RUN: llvm-mc -filetype=obj -triple=arm64ec-windows
arm64ec-func-sym2.s -o arm64e
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec.s -o data-sec.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec2.s -o data-sec2.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows empty-sec.s -o
arm64ec-empty-sec.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows entry-thunk.s -o
entry-thunk.obj
RUN: llvm-mc -filetype=obj -triple=x86_64-windows x86_64-func-sym.s -o
x86_64-func-sym.obj
RUN: llvm-mc -filetype=obj -triple=x86_64-windows empty-sec.s -o
x86_64-empty-sec.obj
RUN: llvm-mc -filetype=obj -triple=aarch64-windows
%S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
@@ -162,15 +163,17 @@ RUN: loadconfig-arm64ec.obj -dll -noentry
-merge:test=.testdata -merge:
RUN: llvm-readobj --coff-load-config testcm.dll | FileCheck
-check-prefix=CODEMAPCM %s
CODEMAPCM: CodeMap [
-CODEMAPCM-NEXT: 0x4008 - 0x4016 X64
+CODEMAPCM-NEXT: 0x4000 - 0x4016 X64
CODEMAPCM-NEXT: ]
RUN: llvm-objdump -d testcm.dll | FileCheck -check-prefix=DISASMCM %s
DISASMCM: Disassembly of section .testdat:
DISASMCM-EMPTY:
DISASMCM-NEXT: 000180004000 <.testdat>:
-DISASMCM-NEXT: 180004000: 0001 udf #0x1
-DISASMCM-NEXT: 180004004: udf #0x0
+DISASMCM-NEXT: 180004000: 01 00addl %eax, (%rax)
+DISASMCM-NEXT: 180004002: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004004: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004006: 00 00addb %al, (%rax)
DISASMCM-NEXT: 180004008: b8 03 00 00 00 movl$0x3, %eax
DISASMCM-NEXT: 18000400d: c3 retq
DISASMCM-NEXT: 18000400e: 00 00addb%al, (%rax)
@@ -207,6 +210,14 @@ DISASMMS-NEXT: 000180006000 :
DISASMMS-NEXT: 180006000: 528000a0 mov w0, #0x5// =5
DISASMMS-NEXT: 180006004: d65f03c0 ret
+Test the code map that includes an ARM64EC function padded by its entry-thunk
offset.
+
+RUN: lld-link -out:testpad.dll -machine:arm64ec entry-thunk.obj
loadconfig-arm64ec.obj -dll -noentry -include:func
+RUN: llvm-readobj --coff-load-config testpad.dll | FileCheck
-check-prefix=CODEMAPPAD %s
+CODEMAPPAD: CodeMap [
+CODEMAPPAD:0x1000 - 0x1010 ARM64EC
+CODEMAPPAD-NEXT: ]
+
#--- arm64-func-sym.s
.text
@@ -266,3 +277,22 @@ x86_64_func_sym2:
.section .empty1, "xr"
.section .empty2, "xr"
.section .empty3, "xr"
+
+#--- entry-thunk.s
+.section .text,"xr",discard,func
+.globl func
+.p2align 2, 0x0
+func:
+mov w0, #1
+ret
+
+.section .wowthk$aa,"xr",discard,thunk
+.globl thunk
+.p2align 2
+thunk:
+ret
+
+.section .hybmp$x,"yi"
+.symidx func
+.symidx thunk
+.word 1 // entry thunk
``
https://github.com/llvm/llvm-project/pull/168369
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)
Gergely =?utf-8?q?Bálint?= , Gergely =?utf-8?q?Bálint?= ,Gergely Balint Message-ID: In-Reply-To: https://github.com/paschalis-mpeis commented: Hey Gergely, Thanks for the updates! Just a reminder to prepend tests with pauth-, so they appear groupped, and `PAuthPacRetDesign.md` too, so the `BTI` doc would appear close to it in the future. Can you also please expand both `CFI` acronyms in a note at top of the doc, for completeness and clarity? https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)
llvmbot wrote:
@llvm/pr-subscribers-backend-arm
Author: None (llvmbot)
Changes
Backport 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868
Requested by: @davemgreen
---
Full diff: https://github.com/llvm/llvm-project/pull/168380.diff
4 Files Affected:
- (modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+11-10)
- (modified) llvm/lib/Target/ARM/ARMSubtarget.cpp (+1-11)
- (modified) llvm/lib/Target/ARM/ARMTargetMachine.h (+14)
- (added) llvm/test/CodeGen/ARM/xxstructor-nodef.ll (+7)
``diff
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 850b00406f09e..aa6ef55dad26c 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -97,7 +97,8 @@ void ARMAsmPrinter::emitXXStructor(const DataLayout &DL,
const Constant *CV) {
const MCExpr *E = MCSymbolRefExpr::create(
GetARMGVSymbol(GV, ARMII::MO_NO_FLAG),
- (Subtarget->isTargetELF() ? ARM::S_TARGET1 : ARM::S_None), OutContext);
+ (TM.getTargetTriple().isOSBinFormatELF() ? ARM::S_TARGET1 : ARM::S_None),
+ OutContext);
OutStreamer->emitValue(E, Size);
}
@@ -595,8 +596,7 @@ void ARMAsmPrinter::emitEndOfAsmFile(Module &M) {
ARMTargetStreamer &ATS = static_cast(TS);
if (OptimizationGoals > 0 &&
- (Subtarget->isTargetAEABI() || Subtarget->isTargetGNUAEABI() ||
- Subtarget->isTargetMuslAEABI()))
+ (TT.isTargetAEABI() || TT.isTargetGNUAEABI() || TT.isTargetMuslAEABI()))
ATS.emitAttribute(ARMBuildAttrs::ABI_optimization_goals,
OptimizationGoals);
OptimizationGoals = -1;
@@ -866,9 +866,10 @@ static uint8_t getModifierSpecifier(ARMCP::ARMCPModifier
Modifier) {
MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue *GV,
unsigned char TargetFlags) {
- if (Subtarget->isTargetMachO()) {
+ const Triple &TT = TM.getTargetTriple();
+ if (TT.isOSBinFormatMachO()) {
bool IsIndirect =
-(TargetFlags & ARMII::MO_NONLAZY) && Subtarget->isGVIndirectSymbol(GV);
+(TargetFlags & ARMII::MO_NONLAZY) && getTM().isGVIndirectSymbol(GV);
if (!IsIndirect)
return getSymbol(GV);
@@ -885,9 +886,8 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue
*GV,
StubSym = MachineModuleInfoImpl::StubValueTy(getSymbol(GV),
!GV->hasInternalLinkage());
return MCSym;
- } else if (Subtarget->isTargetCOFF()) {
-assert(Subtarget->isTargetWindows() &&
- "Windows is the only supported COFF target");
+ } else if (TT.isOSBinFormatCOFF()) {
+assert(TT.isOSWindows() && "Windows is the only supported COFF target");
bool IsIndirect =
(TargetFlags & (ARMII::MO_DLLIMPORT | ARMII::MO_COFFSTUB));
@@ -914,7 +914,7 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue
*GV,
}
return MCSym;
- } else if (Subtarget->isTargetELF()) {
+ } else if (TT.isOSBinFormatELF()) {
return getSymbolPreferLocal(*GV);
}
llvm_unreachable("unexpected target");
@@ -960,7 +960,8 @@ void ARMAsmPrinter::emitMachineConstantPoolValue(
// On Darwin, const-pool entries may get the "FOO$non_lazy_ptr" mangling,
so
// flag the global as MO_NONLAZY.
-unsigned char TF = Subtarget->isTargetMachO() ? ARMII::MO_NONLAZY : 0;
+unsigned char TF =
+TM.getTargetTriple().isOSBinFormatMachO() ? ARMII::MO_NONLAZY : 0;
MCSym = GetARMGVSymbol(GV, TF);
} else if (ACPV->isMachineBasicBlock()) {
const MachineBasicBlock *MBB = cast(ACPV)->getMBB();
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp
b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 13185a7d797a3..63d6e2ea7389b 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -316,17 +316,7 @@ bool ARMSubtarget::isRWPI() const {
}
bool ARMSubtarget::isGVIndirectSymbol(const GlobalValue *GV) const {
- if (!TM.shouldAssumeDSOLocal(GV))
-return true;
-
- // 32 bit macho has no relocation for a-b if a is undefined, even if b is in
- // the section that is being relocated. This means we have to use o load even
- // for GVs that are known to be local to the dso.
- if (isTargetMachO() && TM.isPositionIndependent() &&
- (GV->isDeclarationForLinker() || GV->hasCommonLinkage()))
-return true;
-
- return false;
+ return TM.isGVIndirectSymbol(GV);
}
bool ARMSubtarget::isGVInGOT(const GlobalValue *GV) const {
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.h
b/llvm/lib/Target/ARM/ARMTargetMachine.h
index 1d73af1da6d02..5f17a13dac40e 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.h
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.h
@@ -99,6 +99,20 @@ class ARMBaseTargetMachine : public CodeGenTargetMachineImpl
{
return true;
}
+ bool isGVIndirectSymbol(const GlobalValue *GV) const {
+if (!shouldAssumeDSOLocal(GV))
+ return true;
+
+// 32 bit macho has no relocation for a-b if a is undefined, even if b is
in
+// the s
[llvm-branch-commits] [openmp] 1d80cda - Revert "[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_ui…"
Author: Robert Imschweiler
Date: 2025-11-18T16:03:12+01:00
New Revision: 1d80cda87609b6dcb8a84d60df41bc26b535fdf7
URL:
https://github.com/llvm/llvm-project/commit/1d80cda87609b6dcb8a84d60df41bc26b535fdf7
DIFF:
https://github.com/llvm/llvm-project/commit/1d80cda87609b6dcb8a84d60df41bc26b535fdf7.diff
LOG: Revert "[OpenMP] Implement omp_get_uid_from_device() /
omp_get_device_from_ui…"
This reverts commit 65c4a534bd55ed56962fb99c36f464b3f1c9732f.
Added:
Modified:
offload/include/OpenMP/omp.h
offload/include/omptarget.h
offload/libomptarget/OpenMP/API.cpp
offload/libomptarget/exports
openmp/device/include/DeviceTypes.h
openmp/device/include/Interface.h
openmp/device/src/State.cpp
openmp/runtime/src/dllexports
openmp/runtime/src/include/omp.h.var
openmp/runtime/src/include/omp_lib.F90.var
openmp/runtime/src/include/omp_lib.h.var
openmp/runtime/src/kmp_ftn_entry.h
openmp/runtime/src/kmp_ftn_os.h
Removed:
offload/test/api/omp_device_uid.c
openmp/runtime/test/api/omp_device_uid.c
diff --git a/offload/include/OpenMP/omp.h b/offload/include/OpenMP/omp.h
index d92c7e450c677..768ca46a9bed0 100644
--- a/offload/include/OpenMP/omp.h
+++ b/offload/include/OpenMP/omp.h
@@ -30,13 +30,6 @@
extern "C" {
-/// Definitions
-///{
-
-#define omp_invalid_device -2
-
-///}
-
/// Type declarations
///{
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 00910704a979a..fbb4a06accf84 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -270,8 +270,6 @@ extern "C" {
void ompx_dump_mapping_tables(void);
int omp_get_num_devices(void);
int omp_get_device_num(void);
-int omp_get_device_from_uid(const char *DeviceUid);
-const char *omp_get_uid_from_device(int DeviceNum);
int omp_get_initial_device(void);
void *omp_target_alloc(size_t Size, int DeviceNum);
void omp_target_free(void *DevicePtr, int DeviceNum);
diff --git a/offload/libomptarget/OpenMP/API.cpp
b/offload/libomptarget/OpenMP/API.cpp
index 6e85e5764449c..dd83a3ccd08e6 100644
--- a/offload/libomptarget/OpenMP/API.cpp
+++ b/offload/libomptarget/OpenMP/API.cpp
@@ -40,8 +40,6 @@ EXTERN void ompx_dump_mapping_tables() {
using namespace llvm::omp::target::ompt;
#endif
-using GenericDeviceTy = llvm::omp::target::plugin::GenericDeviceTy;
-
void *targetAllocExplicit(size_t Size, int DeviceNum, int Kind,
const char *Name);
void targetFreeExplicit(void *DevicePtr, int DeviceNum, int Kind,
@@ -70,62 +68,6 @@ EXTERN int omp_get_device_num(void) {
return HostDevice;
}
-static inline bool is_initial_device_uid(const char *DeviceUid) {
- return strcmp(DeviceUid, GenericPluginTy::getHostDeviceUid()) == 0;
-}
-
-EXTERN int omp_get_device_from_uid(const char *DeviceUid) {
- TIMESCOPE();
- OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
-
- if (!DeviceUid) {
-DP("Call to omp_get_device_from_uid returning omp_invalid_device\n");
-return omp_invalid_device;
- }
- if (is_initial_device_uid(DeviceUid)) {
-DP("Call to omp_get_device_from_uid returning initial device number %d\n",
- omp_get_initial_device());
-return omp_get_initial_device();
- }
-
- int DeviceNum = omp_invalid_device;
-
- auto ExclusiveDevicesAccessor = PM->getExclusiveDevicesAccessor();
- for (const DeviceTy &Device : PM->devices(ExclusiveDevicesAccessor)) {
-const char *Uid = Device.RTL->getDevice(Device.RTLDeviceID).getDeviceUid();
-if (Uid && strcmp(DeviceUid, Uid) == 0) {
- DeviceNum = Device.DeviceID;
- break;
-}
- }
-
- DP("Call to omp_get_device_from_uid returning %d\n", DeviceNum);
- return DeviceNum;
-}
-
-EXTERN const char *omp_get_uid_from_device(int DeviceNum) {
- TIMESCOPE();
- OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
-
- if (DeviceNum == omp_invalid_device) {
-DP("Call to omp_get_uid_from_device returning nullptr\n");
-return nullptr;
- }
- if (DeviceNum == omp_get_initial_device()) {
-DP("Call to omp_get_uid_from_device returning initial device UID\n");
-return GenericPluginTy::getHostDeviceUid();
- }
-
- auto DeviceOrErr = PM->getDevice(DeviceNum);
- if (!DeviceOrErr)
-FATAL_MESSAGE(DeviceNum, "%s", toString(DeviceOrErr.takeError()).c_str());
-
- const char *Uid =
- DeviceOrErr->RTL->getDevice(DeviceOrErr->RTLDeviceID).getDeviceUid();
- DP("Call to omp_get_uid_from_device returning %s\n", Uid);
- return Uid;
-}
-
EXTERN int omp_get_initial_device(void) {
TIMESCOPE();
OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
diff --git a/offload/libomptarget/exports b/offload/libomptarget/exports
index 2ebc23e3cf60a..910a5b6c827a7 100644
--- a/offload/libomptarget/exports
+++ b/offload/libomptarget/exports
@@ -40,8 +40,6 @@ VERS1.0 {
omp_get_mapped_ptr;
omp_get_num_devices;
o
[llvm-branch-commits] [llvm] [AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (PR #166361)
@@ -356,20 +356,13 @@ define void @new_za_zt0_caller(ptr %callee)
"aarch64_new_za" "aarch64_new_zt0" n
; Expect clear ZA on entry
define void @new_za_shared_zt0_caller(ptr %callee) "aarch64_new_za"
"aarch64_in_zt0" nounwind {
-; CHECK-LABEL: new_za_shared_zt0_caller:
-; CHECK: // %bb.0:
-; CHECK-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:zero {za}
-; CHECK-NEXT:blr x0
-; CHECK-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:ret
-;
-; CHECK-NEWLOWERING-LABEL: new_za_shared_zt0_caller:
-; CHECK-NEWLOWERING: // %bb.0:
-; CHECK-NEWLOWERING-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEWLOWERING-NEXT:blr x0
MacDue wrote:
This is only relevant for functions with ZT0 (where it's possible to have a new
ZA with a shared ZA interface due to an in/out ZT0), so I hadn't considered it
before.
https://github.com/llvm/llvm-project/pull/166361
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
@@ -5,7 +5,7 @@ // - By default the lit config sets this but we don't want this // test to implicitly depend on this. // - It avoids requiring `--crash` to be passed to `not`. -// RUN: APPLE_ASAN_INIT_FOR_DLOPEN=0 %env_asan_opts=abort_on_error=0 not \ +// RUN: env APPLE_ASAN_INIT_FOR_DLOPEN=0 %env_asan_opts=abort_on_error=0 not \ DanBlackwell wrote: ```suggestion // RUN: %env_asan_opts=abort_on_error=0 APPLE_ASAN_INIT_FOR_DLOPEN=0 not \ ``` `%env_asan_opts` expands to `env ASAN_OPTIONS=` https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][PAC] Warn about synchronous unwind tables (PR #165227)
https://github.com/bgergely0 updated
https://github.com/llvm/llvm-project/pull/165227
From 61e03b5abf74bd5a61f2aa4d21219c67cfbfce24 Mon Sep 17 00:00:00 2001
From: Gergely Balint
Date: Mon, 27 Oct 2025 09:29:54 +
Subject: [PATCH 1/4] [BOLT][PAC] Warn about synchronous unwind tables
BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.
See also: #165215
---
bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp| 9 -
.../AArch64/pacret-synchronous-unwind.cpp | 33 +++
2 files changed, 41 insertions(+), 1 deletion(-)
create mode 100644 bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 91030544d2b88..01af88818a21d 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -133,11 +133,18 @@ Error
PointerAuthCFIAnalyzer::runOnFunctions(BinaryContext &BC) {
ParallelUtilities::runOnEachFunction(
BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
SkipPredicate, "PointerAuthCFIAnalyzer");
+
+ float IgnoredPercent = (100.0 * FunctionsIgnored) / Total;
BC.outs() << "BOLT-INFO: PointerAuthCFIAnalyzer ran on " << Total
<< " functions. Ignored " << FunctionsIgnored << " functions "
-<< format("(%.2lf%%)", (100.0 * FunctionsIgnored) / Total)
+<< format("(%.2lf%%)", IgnoredPercent)
<< " because of CFI inconsistencies\n";
+ if (IgnoredPercent >= 10.0)
+BC.outs() << "BOLT-WARNING: PointerAuthCFIAnalyzer only supports "
+ "asynchronous unwind tables. For C compilers, see "
+ "-fasynchronous-unwind-tables.\n";
+
return Error::success();
}
diff --git a/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
new file mode 100644
index 0..1bfeeaed3715a
--- /dev/null
+++ b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
@@ -0,0 +1,33 @@
+// Test to demonstrate that functions compiled with synchronous unwind tables
+// are ignored by the PointerAuthCFIAnalyzer.
+// Exception handling is needed to have _any_ unwind tables, otherwise the
+// PointerAuthCFIAnalyzer does not run on these functions, so it does not
ignore
+// any function.
+//
+// REQUIRES: system-linux,bolt-runtime
+//
+// RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+// RUN: -mbranch-protection=pac-ret \
+// RUN: -fno-asynchronous-unwind-tables \
+// RUN: %s -o %t.exe -Wl,-q
+// RUN: llvm-bolt %t.exe -o %t.bolt | FileCheck %s --check-prefix=CHECK
+//
+// CHECK: PointerAuthCFIAnalyzer ran on 3 functions. Ignored
+// CHECK-NOT: 0 functions (0.00%) because of CFI inconsistencies
+// CHECK-SAME: 1 functions (33.33%) because of CFI inconsistencies
+// CHECK-NEXT: BOLT-WARNING: PointerAuthCFIAnalyzer only supports asynchronous
+// CHECK-SAME: unwind tables. For C compilers, see
-fasynchronous-unwind-tables.
+
+#include
+#include
+
+void foo() { throw std::runtime_error("Exception from foo()."); }
+
+int main() {
+ try {
+foo();
+ } catch (const std::exception &e) {
+printf("Exception caught: %s\n", e.what());
+ }
+ return 0;
+}
From 7fc8acdbf4cef2aa7f4f5ca9d136d4cb1bce9fe6 Mon Sep 17 00:00:00 2001
From: Gergely Balint
Date: Tue, 28 Oct 2025 09:23:08 +
Subject: [PATCH 2/4] [BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
---
bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp | 27 ++
bolt/test/AArch64/pacret-cfi-incorrect.s | 2 +-
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 01af88818a21d..5979d5fb01818 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -28,6 +28,10 @@
using namespace llvm;
+namespace opts {
+extern llvm::cl::opt Verbosity;
+} // namespace opts
+
namespace llvm {
namespace bolt {
@@ -43,9 +47,10 @@ bool PointerAuthCFIAnalyzer::runOnFunction(BinaryFunction
&BF) {
// Not all functions have .cfi_negate_ra_state in them. But if one
does,
// we expect psign/pauth instructions to have the hasNegateRAState
// annotation.
-BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
- << BF.getPrintName()
- << ": ptr sign/auth inst without .cfi_negate_ra_state\n";
+if (opts::Verbosity >= 1)
+ BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+<< BF.getPrintName()
+<< ": ptr sign/auth inst without .cfi_negate_ra_state\n";
std::lock_guard Lock(IgnoreMutex);
BF.setIgnored();
return false;
@@ -65,9 +70,10 @@ bool PointerAuthCFIAnalyzer::runOnF
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)
@@ -16,12 +17,35 @@ namespace {
class SystemZGOFFObjectWriter : public MCGOFFObjectTargetWriter {
public:
SystemZGOFFObjectWriter();
+
+ unsigned getRelocType(const MCValue &Target, const MCFixup &Fixup,
+bool IsPCRel) const override;
};
} // end anonymous namespace
SystemZGOFFObjectWriter::SystemZGOFFObjectWriter()
: MCGOFFObjectTargetWriter() {}
+unsigned SystemZGOFFObjectWriter::getRelocType(const MCValue &Target,
+ const MCFixup &Fixup,
redstar wrote:
Decided to get the information form `Fixup`.
https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -682,37 +561,22 @@ matchBlocks(BinaryContext &BC, const
yaml::bolt::BinaryFunctionProfile &YamlBF,
<< "\n");
continue;
}
-addMatchedBlock({MatchedBlock, Method}, YamlBF, YamlBB);
- }
-
- for (const auto &[YamlBBIdx, FlowBlockProfile] : MatchedBlocks) {
-const auto &[MatchedBlock, YamlBB] = FlowBlockProfile;
-StaleMatcher::MatchMethod Method = MatchedFlowBlocks.lookup(MatchedBlock);
+MatchedBlocks[YamlBB.Index] = {MatchedBlock, 1};
BlendedBlockHash BinHash = BlendedHashes[MatchedBlock->Index - 1];
-LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBBIdx << ")"
- << " with hash " << Twine::utohexstr(YamlBB->Hash)
+LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBB.Index << ")"
maksfb wrote:
```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: matched yaml block (bid = " <<
YamlBB.Index << ")"
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)
https://github.com/mtrofin updated
https://github.com/llvm/llvm-project/pull/167399
>From f5a571197ba3ec726353ca4f0550d381a6a8dcba Mon Sep 17 00:00:00 2001
From: Mircea Trofin
Date: Mon, 10 Nov 2025 12:33:12 -0800
Subject: [PATCH] [LTT] Mark as unkown weak function tests.
---
llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 3 +++
llvm/test/Transforms/LowerTypeTests/function-weak.ll | 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 31b5487ce6ec6..7b046978802db 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -1493,6 +1493,9 @@ void
LowerTypeTestsModule::replaceWeakDeclarationWithJumpTablePtr(
Constant::getNullValue(F->getType()));
Value *Select = Builder.CreateSelect(ICmp, JT,
Constant::getNullValue(F->getType()));
+
+if (auto *SI = dyn_cast(Select))
+ setExplicitlyUnknownBranchWeightsIfProfiled(*SI, DEBUG_TYPE);
// For phi nodes, we need to update the incoming value for all operands
// with the same predecessor.
if (PN)
diff --git a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
index 4ea03b6c2c1fa..dbbe8fa4a0a9a 100644
--- a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
@@ -32,10 +32,10 @@ target triple = "x86_64-unknown-linux-gnu"
declare !type !0 extern_weak void @f()
; CHECK: define zeroext i1 @check_f()
-define zeroext i1 @check_f() {
+define zeroext i1 @check_f() !prof !{!"function_entry_count", i32 10} {
entry:
; CHECK: [[CMP:%.*]] = icmp ne ptr @f, null
-; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null
+; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null, !prof
![[SELPROF:[0-9]+]]
; CHECK: [[PTI:%.*]] = ptrtoint ptr [[SEL]] to i1
; CHECK: ret i1 [[PTI]]
ret i1 ptrtoint (ptr @f to i1)
@@ -165,3 +165,4 @@ define i1 @foo(ptr %p) {
; CHECK-NEXT: }
!0 = !{i32 0, !"typeid1"}
+; CHECK: ![[SELPROF]] = !{!"unknown", !"lowertypetests"}
\ No newline at end of file
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -242,6 +253,18 @@ class YAMLProfileReader : public ProfileReaderBase {
ProfiledFunctions.emplace(&BF);
}
+ /// Return a top-level binary inline tree node for a given \p BF
maksfb wrote:
```suggestion
/// Return a top-level binary inline tree node for a given \p BF.
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)
https://github.com/mtrofin updated
https://github.com/llvm/llvm-project/pull/167399
>From f5a571197ba3ec726353ca4f0550d381a6a8dcba Mon Sep 17 00:00:00 2001
From: Mircea Trofin
Date: Mon, 10 Nov 2025 12:33:12 -0800
Subject: [PATCH] [LTT] Mark as unkown weak function tests.
---
llvm/lib/Transforms/IPO/LowerTypeTests.cpp | 3 +++
llvm/test/Transforms/LowerTypeTests/function-weak.ll | 5 +++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 31b5487ce6ec6..7b046978802db 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -1493,6 +1493,9 @@ void
LowerTypeTestsModule::replaceWeakDeclarationWithJumpTablePtr(
Constant::getNullValue(F->getType()));
Value *Select = Builder.CreateSelect(ICmp, JT,
Constant::getNullValue(F->getType()));
+
+if (auto *SI = dyn_cast(Select))
+ setExplicitlyUnknownBranchWeightsIfProfiled(*SI, DEBUG_TYPE);
// For phi nodes, we need to update the incoming value for all operands
// with the same predecessor.
if (PN)
diff --git a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
index 4ea03b6c2c1fa..dbbe8fa4a0a9a 100644
--- a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
@@ -32,10 +32,10 @@ target triple = "x86_64-unknown-linux-gnu"
declare !type !0 extern_weak void @f()
; CHECK: define zeroext i1 @check_f()
-define zeroext i1 @check_f() {
+define zeroext i1 @check_f() !prof !{!"function_entry_count", i32 10} {
entry:
; CHECK: [[CMP:%.*]] = icmp ne ptr @f, null
-; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null
+; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null, !prof
![[SELPROF:[0-9]+]]
; CHECK: [[PTI:%.*]] = ptrtoint ptr [[SEL]] to i1
; CHECK: ret i1 [[PTI]]
ret i1 ptrtoint (ptr @f to i1)
@@ -165,3 +165,4 @@ define i1 @foo(ptr %p) {
; CHECK-NEXT: }
!0 = !{i32 0, !"typeid1"}
+; CHECK: ![[SELPROF]] = !{!"unknown", !"lowertypetests"}
\ No newline at end of file
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
yonghong-song wrote: I have no objection. But in the above, I see ``` Merging is blocked Cannot update this protected ref. ``` Not sure what is the problem. https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
maksfb wrote:
```suggestion
const uint64_t Addr = BF.getAddress();
const uint64_t Size = BF.getSize();
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
+ auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+ if (Probes.empty())
+return nullptr;
+ const MCDecodedPseudoProbe &Probe = *Probes.begin();
+ const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+ while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+ return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+ using namespace yaml::bolt;
+ BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+ const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+ using ChildMapTy =
+ std::unordered_map;
+ using CallSiteInfoTy =
+ std::unordered_map;
+ // Mapping from a parent node id to a map InlineSite -> Child node.
+ DenseMap ParentToChildren;
+ // Collect calls in the profile: map from a parent node id to a map
+ // InlineSite -> CallSiteInfo ptr.
+ DenseMap ParentToCSI;
+ for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+ ProbeMatchingStats.TotalCallCount += CSI.Count;
+ ++ProbeMatchingStats.TotalCallSites;
+ if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+ if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ // Get callee GUID
+ if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');
+++ProbeMatchingStats.MissingInlineTree;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ uint64_t CalleeGUID = Callee->InlineTree.front().GUID;
+ ParentToCSI[CSI.InlineTreeNode][InlineSite(CalleeGUID, CSI.Probe)] =
&CSI;
+}
+ }
+ LLVM_DEBUG({
+for (auto &[ParentId, InlineSiteCSI] : ParentToCSI) {
+ for (auto &[InlineSite, CSI] : InlineSiteCSI) {
+auto [CalleeGUID, CallSite] = InlineSite;
+errs() << ParentId << "@" << CallSite << "->"
+ << Twine::utohexstr(CalleeGUID) << ": " << CSI->Count << ", "
+ << Twine::utohexstr(CSI->Offset) << '\n';
+ }
+}
+ });
+
+ assert(!Root.isRoot());
+ LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@"
+<< Twine::utohexstr(Root.Guid) << " and " << YamlBF.Name
+<< "@" << Twine::utohexstr(FuncNode.GUID) << '\n');
+ ++ProbeMatchingStats.AttemptedNodes;
+ ++ProbeMatchingStats.AttemptedRoots;
+
+ // Match profile function with a lead node (top-level function or inlinee)
+ if (Root.Guid != FuncNode.GUID) {
+LLVM_DEBUG(dbgs() << "
[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)
dyung wrote: > The same reason as the original bug reporter @iliastsi - trying to fix GHC in > Debian. >From the original bug report, it seems that this has been broken since at >least LLVM 17. If this is the case, how have you been building GHC all this >time? Were you using LLVM 16? At this point in the release, we are only accepting patches for recent regressions or major issues. Given this issues seems to have been around for quite a while (and only seems to be hit in this one case), I am leaning towards not merging this change into the 21.x release branch. If you feel strongly that it should be merged in, please let us know the rationale and I will discuss it with the other release managers to decide whether it should be included at this point of the release. https://github.com/llvm/llvm-project/pull/168380 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)
https://github.com/efriedma-quic approved this pull request. LGTM This is a simple bugfix, and it obviously has no impact on non-ARM64EC targets. https://github.com/llvm/llvm-project/pull/168371 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)
https://github.com/mtrofin ready_for_review https://github.com/llvm/llvm-project/pull/167399 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/168371
Backport 615299934489953deaf202cc445ac9f8ad362afc
Requested by: @cjacek
>From 3da9c16880cdda218f52cbff964f7e1974d012ae Mon Sep 17 00:00:00 2001
From: Jacek Caban
Date: Tue, 4 Nov 2025 00:04:36 +0100
Subject: [PATCH] [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect
calls (#165885)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Guest exit thunks serve as glue for performing direct calls, so they
shouldn’t treat the target as an indirect one.
Spotted by @coneco-cy in #165504.
(cherry picked from commit 615299934489953deaf202cc445ac9f8ad362afc)
---
.../AArch64/AArch64Arm64ECCallLowering.cpp| 14 ++
llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll | 49 +--
2 files changed, 50 insertions(+), 13 deletions(-)
diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
index 509cbb092705d..b4e1e3517fa64 100644
--- a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
@@ -632,16 +632,10 @@ Function
*AArch64Arm64ECCallLowering::buildGuestExitThunk(Function *F) {
BasicBlock *BB = BasicBlock::Create(M->getContext(), "", GuestExit);
IRBuilder<> B(BB);
- // Load the global symbol as a pointer to the check function.
- Value *GuardFn;
- if (cfguard_module_flag == 2 && !F->hasFnAttribute("guard_nocf"))
-GuardFn = GuardFnCFGlobal;
- else
-GuardFn = GuardFnGlobal;
- LoadInst *GuardCheckLoad = B.CreateLoad(PtrTy, GuardFn);
-
- // Create new call instruction. The CFGuard check should always be a call,
- // even if the original CallBase is an Invoke or CallBr instruction.
+ // Create new call instruction. The call check should always be a call,
+ // even if the original CallBase is an Invoke or CallBr instructio.
+ // This is treated as a direct call, so do not use GuardFnCFGlobal.
+ LoadInst *GuardCheckLoad = B.CreateLoad(PtrTy, GuardFnGlobal);
Function *Thunk = buildExitThunk(F->getFunctionType(), F->getAttributes());
CallInst *GuardCheck = B.CreateCall(
GuardFnType, GuardCheckLoad, {F, Thunk});
diff --git a/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
b/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
index bdbc99e2d98b0..75e7ac902274d 100644
--- a/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
+++ b/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
@@ -2,15 +2,58 @@
declare void @called()
declare void @escaped()
-define void @f(ptr %dst) {
+define void @f(ptr %dst, ptr readonly %f) {
call void @called()
+; CHECK: bl "#called"
store ptr @escaped, ptr %dst
- ret void
+ call void %f()
+; CHECK: adrpx10, $iexit_thunk$cdecl$v$v
+; CHECK-NEXT: add x10, x10, :lo12:$iexit_thunk$cdecl$v$v
+; CHECK-NEXT: str x8, [x20]
+; CHECK-NEXT: adrpx8, __os_arm64x_check_icall_cfg
+; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall_cfg]
+; CHECK-NEXT: mov x11,
+; CHECK-NEXT: blr x8
+; CHECK-NEXT: blr x11
+ret void
}
+; CHECK-LABEL:.def "#called$exit_thunk";
+; CHECK-NEXT: .scl 2;
+; CHECK-NEXT: .type 32;
+; CHECK-NEXT: .endef
+; CHECK-NEXT: .section .wowthk$aa,"xr",discard,"#called$exit_thunk"
+; CHECK-NEXT: .globl "#called$exit_thunk"// -- Begin function
#called$exit_thunk
+; CHECK-NEXT: .p2align 2
+; CHECK-NEXT: "#called$exit_thunk": // @"#called$exit_thunk"
+; CHECK-NEXT: .weak_anti_dep called
+; CHECK-NEXT: called = "#called"
+; CHECK-NEXT: .weak_anti_dep "#called"
+; CHECK-NEXT: "#called" = "#called$exit_thunk"
+; CHECK-NEXT:.seh_proc "#called$exit_thunk"
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]!// 8-byte Folded Spill
+; CHECK-NEXT: .seh_save_reg_x x30, 16
+; CHECK-NEXT: .seh_endprologue
+; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
+; CHECK-NEXT: adrp x11, called
+; CHECK-NEXT: add x11, x11, :lo12:called
+; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
+; CHECK-NEXT: adrp x10, $iexit_thunk$cdecl$v$v
+; CHECK-NEXT: add x10, x10, :lo12:$iexit_thunk$cdecl$v$v
+; CHECK-NEXT: blr x8
+; CHECK-NEXT: .seh_startepilogue
+; CHECK-NEXT: ldr x30, [sp], #16 // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
+; CHECK-NEXT: .seh_endepilogue
+; CHECK-NEXT: br x11
+; CHECK-NEXT: .seh_endfunclet
+; CHECK-NEXT: .seh_endproc
+
!llvm.module.flags = !{!0}
-!0 = !{i32 2, !"cfguard", i32 1}
+!0 = !{i32 2, !"cfguard", i32 2}
; CHECK-LABEL: .section .gfids$y,"dr"
; CHECK-NEXT: .symidx escaped
+; CHECK-NEXT: .symidx $iexit_thunk$cdecl$v$v
; CHECK-NOT: .symidx
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commi
[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #148900)
jofrn wrote: Moved the .td records required for this PR back to prior PRs https://github.com/llvm/llvm-project/pull/148899 (Cast atomic vec with float -- containing the Pats for X86) and https://github.com/llvm/llvm-project/pull/165818 (Split -- containing the definitions of the PatFrag atomics). https://github.com/llvm/llvm-project/pull/148900 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -1552,25 +1552,14 @@ Error PrintProgramStats::runOnFunctions(BinaryContext
&BC) {
100.0 * BC.Stats.ExactMatchedSampleCount / BC.Stats.StaleSampleCount,
BC.Stats.ExactMatchedSampleCount, BC.Stats.StaleSampleCount);
BC.outs() << format(
-"BOLT-INFO: inference found an exact pseudo probe match for %.2f%% of "
+"BOLT-INFO: inference found pseudo probe match for %.2f%% of "
maksfb wrote:
Do we want to print this info if pseudo probes are not present?
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
llvmbot wrote:
@llvm/pr-subscribers-llvm-selectiondag
Author: Matt Arsenault (arsenm)
Changes
---
Patch is 76.41 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/168290.diff
6 Files Affected:
- (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+12-12)
- (modified) llvm/test/CodeGen/AArch64/sve-extract-scalable-vector.ll (-7)
- (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
(+133-133)
- (modified) llvm/test/CodeGen/X86/half.ll (+64-69)
- (modified) llvm/test/CodeGen/X86/matrix-multiply.ll (+38-36)
- (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
(+216-218)
``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Widen the input and call convert on the widened input vector.
unsigned NumConcat =
WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
- SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+ SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
Ops[0] = InOp;
SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
- SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
// Use the original element count so we don't do more scalar opts than
// necessary.
unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
std::array EltVTs = {{EltVT, MVT::Other}};
- SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
SmallVector OpChains;
// Use the original element count so we don't do more scalar opts than
// necessary.
@@ -5819,7 +5819,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
}
while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
return DAG.getBuildVector(WidenVT, DL, Ops);
}
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
// input and then widening it. To avoid this, we widen the input only
if
// it results in a legal type.
if (WidenSize % InSize == 0) {
- SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+ SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
Ops[0] = InOp;
NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
SmallVector Ops;
DAG.ExtractVectorElements(InOp, Ops);
Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
}
@@ -6088,7 +6088,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
if (WidenNumElts % NumInElts == 0) {
// Add undef vectors to widen to correct length.
unsigned NumConcat = WidenNumElts / NumInElts;
- SDValue UndefVal = DAG.getUNDEF(InVT);
+ SDValue UndefVal = DAG.getPOISON(InVT);
SmallVector Ops(NumConcat);
for (unsigned i=0; i < NumOperands; ++i)
Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
for (unsigned j = 0; j < NumInElts; ++j)
Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
}
- SDValue UndefVal = DAG.getUNDEF(EltVT);
+ SDValue UndefVal = DAG.getPOISON(EltVT);
for (; Idx < WidenNumElts; ++Idx)
Ops[Idx] = UndefVal;
return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
Parts.push_back(
DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
}
@@ -6229,7 +6229,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(S
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
https://github.com/arsenm created
https://github.com/llvm/llvm-project/pull/168290
None
>From 2f389b76f03f8e266e18eaef26bfc96e75a65ba7 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Fri, 14 Nov 2025 21:47:44 -0800
Subject: [PATCH] DAG: Use poison for some vector result widening
---
.../SelectionDAG/LegalizeVectorTypes.cpp | 24 +-
.../AArch64/sve-extract-scalable-vector.ll| 7 -
.../vector-constrained-fp-intrinsics.ll | 266 +--
llvm/test/CodeGen/X86/half.ll | 133 +++---
llvm/test/CodeGen/X86/matrix-multiply.ll | 74 +--
.../X86/vector-constrained-fp-intrinsics.ll | 434 +-
6 files changed, 463 insertions(+), 475 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Widen the input and call convert on the widened input vector.
unsigned NumConcat =
WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
- SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+ SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
Ops[0] = InOp;
SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
- SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
// Use the original element count so we don't do more scalar opts than
// necessary.
unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
std::array EltVTs = {{EltVT, MVT::Other}};
- SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
SmallVector OpChains;
// Use the original element count so we don't do more scalar opts than
// necessary.
@@ -5819,7 +5819,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
}
while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
return DAG.getBuildVector(WidenVT, DL, Ops);
}
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
// input and then widening it. To avoid this, we widen the input only
if
// it results in a legal type.
if (WidenSize % InSize == 0) {
- SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+ SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
Ops[0] = InOp;
NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
SmallVector Ops;
DAG.ExtractVectorElements(InOp, Ops);
Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
}
@@ -6088,7 +6088,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
if (WidenNumElts % NumInElts == 0) {
// Add undef vectors to widen to correct length.
unsigned NumConcat = WidenNumElts / NumInElts;
- SDValue UndefVal = DAG.getUNDEF(InVT);
+ SDValue UndefVal = DAG.getPOISON(InVT);
SmallVector Ops(NumConcat);
for (unsigned i=0; i < NumOperands; ++i)
Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
for (unsigned j = 0; j < NumInElts; ++j)
Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
}
- SDValue UndefVal = DAG.getUNDEF(EltVT);
+ SDValue UndefVal = DAG.getPOISON(EltVT);
for (; Idx < WidenNumElts; ++Idx)
Ops[Idx] = UndefVal;
return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
Parts.push_back(
DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
}
@@ -6229,7 +6229,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVEC
[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)
https://github.com/redstar updated
https://github.com/llvm/llvm-project/pull/167054
>From 1f9bfbbd5e893bcab320dc26c71e49779ef7d04d Mon Sep 17 00:00:00 2001
From: Kai Nacke
Date: Fri, 7 Nov 2025 11:13:49 -0500
Subject: [PATCH 1/2] [GOFF] Write out relocations in the GOFF writer
Add support for writing relocations. Since the symbol numbering is only
available after the symbols are written, the relocations are collected
in a vector. At write time, the relocations are converted using the
symbols ids, compressed and written out. A relocation data record is
limited to 32K-1 bytes, which requires making sure that larger relocation
data is written into multiple records.
---
llvm/include/llvm/BinaryFormat/GOFF.h | 26 ++
llvm/include/llvm/MC/MCGOFFObjectWriter.h | 37 ++-
llvm/lib/MC/GOFFObjectWriter.cpp | 266 +-
.../MCTargetDesc/SystemZGOFFObjectWriter.cpp | 24 ++
.../SystemZ/MCTargetDesc/SystemZMCAsmInfo.h | 1 +
llvm/test/CodeGen/SystemZ/zos-section-1.ll| 23 +-
llvm/test/CodeGen/SystemZ/zos-section-2.ll| 16 +-
7 files changed, 378 insertions(+), 15 deletions(-)
diff --git a/llvm/include/llvm/BinaryFormat/GOFF.h
b/llvm/include/llvm/BinaryFormat/GOFF.h
index 49d2809cb6524..08bdb5d624fca 100644
--- a/llvm/include/llvm/BinaryFormat/GOFF.h
+++ b/llvm/include/llvm/BinaryFormat/GOFF.h
@@ -157,6 +157,32 @@ enum ESDAlignment : uint8_t {
ESD_ALIGN_4Kpage = 12,
};
+enum RLDReferenceType : uint8_t {
+ RLD_RT_RAddress = 0,
+ RLD_RT_ROffset = 1,
+ RLD_RT_RLength = 2,
+ RLD_RT_RRelativeImmediate = 6,
+ RLD_RT_RTypeConstant = 7,
+ RLD_RT_RLongDisplacement = 9,
+};
+
+enum RLDReferentType : uint8_t {
+ RLD_RO_Label = 0,
+ RLD_RO_Element = 1,
+ RLD_RO_Class = 2,
+ RLD_RO_Part = 3,
+};
+
+enum RLDAction : uint8_t {
+ RLD_ACT_Add = 0,
+ RLD_ACT_Subtract = 1,
+};
+
+enum RLDFetchStore : uint8_t {
+ RLD_FS_Fetch = 0,
+ RLD_FS_Store = 1
+};
+
enum ENDEntryPointRequest : uint8_t {
END_EPR_None = 0,
END_EPR_EsdidOffset = 1,
diff --git a/llvm/include/llvm/MC/MCGOFFObjectWriter.h
b/llvm/include/llvm/MC/MCGOFFObjectWriter.h
index ec07637dd2847..408d432a8f54f 100644
--- a/llvm/include/llvm/MC/MCGOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCGOFFObjectWriter.h
@@ -11,9 +11,13 @@
#include "llvm/MC/MCObjectWriter.h"
#include "llvm/MC/MCValue.h"
+#include
+#include
namespace llvm {
class MCObjectWriter;
+class MCSectionGOFF;
+class MCSymbolGOFF;
class raw_pwrite_stream;
class MCGOFFObjectTargetWriter : public MCObjectTargetWriter {
@@ -21,8 +25,19 @@ class MCGOFFObjectTargetWriter : public MCObjectTargetWriter
{
MCGOFFObjectTargetWriter() = default;
public:
+ enum RLDRelocationType {
+Reloc_Type_ACon = 0x1, // General address.
+Reloc_Type_RelImm = 0x2, // Relative-immediate address.
+Reloc_Type_QCon = 0x3, // Offset of symbol in class.
+Reloc_Type_VCon = 0x4, // Address of external symbol.
+Reloc_Type_RCon = 0x5, // PSECT of symbol.
+ };
+
~MCGOFFObjectTargetWriter() override = default;
+ virtual unsigned getRelocType(const MCValue &Target, const MCFixup &Fixup,
+bool IsPCRel) const = 0;
+
Triple::ObjectFormatType getFormat() const override { return Triple::GOFF; }
static bool classof(const MCObjectTargetWriter *W) {
@@ -30,6 +45,23 @@ class MCGOFFObjectTargetWriter : public MCObjectTargetWriter
{
}
};
+struct GOFFSavedRelocationEntry {
+ const MCSectionGOFF *Section;
+ const MCSymbolGOFF *SymA;
+ const MCSymbolGOFF *SymB;
+ unsigned RelocType;
+ uint64_t FixupOffset;
+ uint32_t Length;
+ uint64_t FixedValue; // Info only.
+
+ GOFFSavedRelocationEntry(const MCSectionGOFF *Section,
+ const MCSymbolGOFF *SymA, const MCSymbolGOFF *SymB,
+ unsigned RelocType, uint64_t FixupOffset,
+ uint32_t Length, uint64_t FixedValue)
+ : Section(Section), SymA(SymA), SymB(SymB), RelocType(RelocType),
+FixupOffset(FixupOffset), Length(Length), FixedValue(FixedValue) {}
+};
+
class GOFFObjectWriter : public MCObjectWriter {
// The target specific GOFF writer instance.
std::unique_ptr TargetObjectWriter;
@@ -37,6 +69,9 @@ class GOFFObjectWriter : public MCObjectWriter {
// The stream used to write the GOFF records.
raw_pwrite_stream &OS;
+ // Saved relocation data.
+ std::vector SavedRelocs;
+
public:
GOFFObjectWriter(std::unique_ptr MOTW,
raw_pwrite_stream &OS);
@@ -44,7 +79,7 @@ class GOFFObjectWriter : public MCObjectWriter {
// Implementation of the MCObjectWriter interface.
void recordRelocation(const MCFragment &F, const MCFixup &Fixup,
-MCValue Target, uint64_t &FixedValue) override {}
+MCValue Target, uint64_t &FixedValue) override;
uint64_t writeObject() override;
};
diff --git a/llvm/lib/MC/GOFFObjectWriter.cpp b/llvm/lib/MC/GOFFObjectWriter.cpp
index 07aec
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
https://github.com/petar-avramovic updated
https://github.com/llvm/llvm-project/pull/168411
>From 73f2bf84bb2bcff3cd20aa207116f214cde943f9 Mon Sep 17 00:00:00 2001
From: Petar Avramovic
Date: Mon, 17 Nov 2025 18:47:58 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and
G_FNEG
---
.../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 26 +-
.../AMDGPU/AMDGPURegBankLegalizeHelper.h | 1 +
.../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 19 +
llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll | 340 ++
llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll | 303
5 files changed, 683 insertions(+), 6 deletions(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..123fc5bf37a19 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -437,6 +437,13 @@ std::pair
RegBankLegalizeHelper::unpackAExt(Register Reg) {
return {Lo.getReg(0), Hi.getReg(0)};
}
+std::pair
+RegBankLegalizeHelper::unpackAExtTruncS16(Register Reg) {
+ auto [Lo32, Hi32] = unpackAExt(Reg);
+ return {B.buildTrunc(SgprRB_S16, Lo32).getReg(0),
+ B.buildTrunc(SgprRB_S16, Hi32).getReg(0)};
+}
+
void RegBankLegalizeHelper::lowerUnpackBitShift(MachineInstr &MI) {
Register Lo, Hi;
switch (MI.getOpcode()) {
@@ -629,14 +636,21 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr
&MI) {
void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
Register Dst = MI.getOperand(0).getReg();
assert(MRI.getType(Dst) == V2S16);
- auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
- auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
unsigned Opc = MI.getOpcode();
auto Flags = MI.getFlags();
- auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
- auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
- auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
- auto Op2Hi = B.buildTrunc(SgprRB_S16, Op2Hi32);
+
+ if (MI.getNumOperands() == 2) {
+auto [Op1Lo, Op1Hi] = unpackAExtTruncS16(MI.getOperand(1).getReg());
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+ }
+
+ assert(MI.getNumOperands() == 3);
+ auto [Op1Lo, Op1Hi] = unpackAExtTruncS16(MI.getOperand(1).getReg());
+ auto [Op2Lo, Op2Hi] = unpackAExtTruncS16(MI.getOperand(2).getReg());
auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo, Op2Lo}, Flags);
auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi, Op2Hi}, Flags);
B.buildMergeLikeInstr(Dst, {Lo, Hi});
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
index e7598f888e4b5..4f1c3c02fa5d6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
@@ -118,6 +118,7 @@ class RegBankLegalizeHelper {
std::pair unpackZExt(Register Reg);
std::pair unpackSExt(Register Reg);
std::pair unpackAExt(Register Reg);
+ std::pair unpackAExtTruncS16(Register Reg);
void lowerUnpackBitShift(MachineInstr &MI);
void lowerV_BFE(MachineInstr &MI);
void lowerS_BFE(MachineInstr &MI);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const
GCNSubtarget &_ST,
.Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
.Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
+ // FNEG and FABS are either folded as source modifiers or can be selected as
+ // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+ // targets without SALU float we still select them as VGPR since there would
+ // be no real sgpr use.
+ addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+ .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+ .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+ .Div(S16, {{Vgpr16}, {Vgpr16}})
+ .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+ .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+ .Div(S32, {{Vgpr32}, {Vgpr32}})
+ .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+ .Div(S64, {{Vgpr64}, {Vgpr64}})
+ .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+ .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+ .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+ .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+ .Any({{DivV2S32}, {{
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
@@ -0,0 +1,340 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck -check-prefixes=GCN,GFX11
%s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck -check-prefixes=GCN,GFX12
%s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN: ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b16 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+ %fabs = call half @llvm.fabs.f16(half %in)
+ store half %fabs, ptr addrspace(1) %out
+ ret void
+}
+define amdgpu_ps void @s_fabs_f16(half inreg %in, ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f16:
+; GFX11: ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:global_store_b16 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f16:
+; GFX12: ; %bb.0:
+; GFX12-NEXT:s_and_b32 s0, s0, 0x7fff
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b16 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+ %fabs = call half @llvm.fabs.f16(half %in)
+ store half %fabs, ptr addrspace(1) %out
+ ret void
+}
+define amdgpu_ps void @s_fabs_f16_salu_use(half inreg %in, i32 inreg %val, ptr
addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f16_salu_use:
+; GFX11: ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:s_cmp_eq_u32 s1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) |
instid1(SALU_CYCLE_1)
+; GFX11-NEXT:v_readfirstlane_b32 s0, v2
+; GFX11-NEXT:s_cselect_b32 s0, s0, 0
+; GFX11-NEXT:v_mov_b32_e32 v2, s0
+; GFX11-NEXT:global_store_b16 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f16_salu_use:
+; GFX12: ; %bb.0:
+; GFX12-NEXT:s_and_b32 s0, s0, 0x7fff
+; GFX12-NEXT:s_cmp_eq_u32 s1, 0
+; GFX12-NEXT:s_cselect_b32 s0, s0, 0
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b16 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+ %fabs = call half @llvm.fabs.f16(half %in)
+ %cond = icmp eq i32 %val, 0
+ %sel = select i1 %cond, half %fabs, half 0.0
+ store half %sel, ptr addrspace(1) %out
+ ret void
+}
+
+define amdgpu_ps void @v_fabs_f32(float %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f32:
+; GCN: ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b32 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+ %fabs = call float @llvm.fabs.f32(float %in)
+ store float %fabs, ptr addrspace(1) %out
+ ret void
+}
+define amdgpu_ps void @s_fabs_f32(float inreg %in, ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f32:
+; GFX11: ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:global_store_b32 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f32:
+; GFX12: ; %bb.0:
+; GFX12-NEXT:s_bitset0_b32 s0, 31
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b32 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+ %fabs = call float @llvm.fabs.f32(float %in)
+ store float %fabs, ptr addrspace(1) %out
+ ret void
+}
+define amdgpu_ps void @s_fabs_f32_salu_use(float inreg %in, i32 inreg %val,
ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f32_salu_use:
+; GFX11: ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:s_cmp_eq_u32 s1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) |
instid1(SALU_CYCLE_1)
+; GFX11-NEXT:v_readfirstlane_b32 s0, v2
+; GFX11-NEXT:s_cselect_b32 s0, s0, 0
+; GFX11-NEXT:v_mov_b32_e32 v2, s0
+; GFX11-NEXT:global_store_b32 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f32_salu_use:
+; GFX12: ; %bb.0:
+; GFX12-NEXT:s_bitset0_b32 s0, 31
+; GFX12-NEXT:s_cmp_eq_u32 s1, 0
+; GFX12-NEXT:s_cselect_b32 s0, s0, 0
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b32 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+ %fabs = call float @llvm.fabs.f32(float %in)
+ %cond = icmp eq i32 %val, 0
+ %sel = select i1 %cond, float %fabs, float 0.0
+ store float %sel, ptr addrspace(1) %out
+ ret void
+}
+
+define amdgpu_ps void @v_fabs_f64(double %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f64:
+; GCN: ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v1, 0x7fff, v1
+; GCN-NEXT:global_store_b64 v[2:3], v[0:1], off
+; GCN-NEXT:s_endpgm
+ %fabs = call double @llvm.fabs.f64(double %in)
+ store double %fabs, ptr addrspace(1) %out
+ ret void
+}
+define amdgpu_ps void @s_fabs_f64(double inreg %in, ptr addrspace(1) %out) {
+; G
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
llvmbot wrote:
@llvm/pr-subscribers-backend-amdgpu
Author: Petar Avramovic (petar-avramovic)
Changes
---
Patch is 21.42 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/168411.diff
4 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp (+15-2)
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp (+19)
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll (+233)
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll (+216)
``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..d719f3d40295d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -629,10 +629,23 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr
&MI) {
void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
Register Dst = MI.getOperand(0).getReg();
assert(MRI.getType(Dst) == V2S16);
- auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
- auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
unsigned Opc = MI.getOpcode();
auto Flags = MI.getFlags();
+
+ if (MI.getNumOperands() == 2) {
+auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
+auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+ }
+
+ assert(MI.getNumOperands() == 3);
+ auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+ auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const
GCNSubtarget &_ST,
.Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
.Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
+ // FNEG and FABS are either folded as source modifiers or can be selected as
+ // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+ // targets without SALU float we still select them as VGPR since there would
+ // be no real sgpr use.
+ addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+ .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+ .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+ .Div(S16, {{Vgpr16}, {Vgpr16}})
+ .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+ .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+ .Div(S32, {{Vgpr32}, {Vgpr32}})
+ .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+ .Div(S64, {{Vgpr64}, {Vgpr64}})
+ .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+ .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+ .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+ .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+ .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32}}});
+
addRulesForGOpcs({G_FPTOUI})
.Any({{UniS32, S32}, {{Sgpr32}, {Sgpr32}}}, hasSALUFloat)
.Any({{UniS32, S32}, {{UniInVgprS32}, {Vgpr32}}}, !hasSALUFloat);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
new file mode 100644
index 0..093cdf744e3b4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
@@ -0,0 +1,233 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1100 -o -
%s | FileCheck -check-prefixes=GCN,GFX11,GFX11-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck
-check-prefixes=GCN,GFX11,GFX11-GISEL %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1200 -o -
%s | FileCheck -check-prefixes=GCN,GFX12,GFX12-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck
-check-prefixes=GCN,GFX12,GFX12-GISEL %s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN: ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b16 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+ %fabs = call half @llvm.fabs.f16(half %in)
+ store half %fabs, ptr addrspace(1) %out
+ ret void
+}
+de
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
https://github.com/arsenm updated
https://github.com/llvm/llvm-project/pull/168290
>From 6b6155931582b2f8924a76b268f06d9e2696d489 Mon Sep 17 00:00:00 2001
From: Matt Arsenault
Date: Fri, 14 Nov 2025 21:47:44 -0800
Subject: [PATCH] DAG: Use poison for some vector result widening
---
.../SelectionDAG/LegalizeVectorTypes.cpp | 24 +-
.../AArch64/sve-extract-scalable-vector.ll| 7 -
.../vector-constrained-fp-intrinsics.ll | 266 +--
llvm/test/CodeGen/X86/matrix-multiply.ll | 74 +--
.../X86/vector-constrained-fp-intrinsics.ll | 434 +-
5 files changed, 399 insertions(+), 406 deletions(-)
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Widen the input and call convert on the widened input vector.
unsigned NumConcat =
WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
- SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+ SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
Ops[0] = InOp;
SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
- SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
// Use the original element count so we don't do more scalar opts than
// necessary.
unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
std::array EltVTs = {{EltVT, MVT::Other}};
- SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
SmallVector OpChains;
// Use the original element count so we don't do more scalar opts than
// necessary.
@@ -5819,7 +5819,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
}
while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
return DAG.getBuildVector(WidenVT, DL, Ops);
}
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
// input and then widening it. To avoid this, we widen the input only
if
// it results in a legal type.
if (WidenSize % InSize == 0) {
- SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+ SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
Ops[0] = InOp;
NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
SmallVector Ops;
DAG.ExtractVectorElements(InOp, Ops);
Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
}
@@ -6088,7 +6088,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
if (WidenNumElts % NumInElts == 0) {
// Add undef vectors to widen to correct length.
unsigned NumConcat = WidenNumElts / NumInElts;
- SDValue UndefVal = DAG.getUNDEF(InVT);
+ SDValue UndefVal = DAG.getPOISON(InVT);
SmallVector Ops(NumConcat);
for (unsigned i=0; i < NumOperands; ++i)
Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
for (unsigned j = 0; j < NumInElts; ++j)
Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
}
- SDValue UndefVal = DAG.getUNDEF(EltVT);
+ SDValue UndefVal = DAG.getPOISON(EltVT);
for (; Idx < WidenNumElts; ++Idx)
Ops[Idx] = UndefVal;
return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
Parts.push_back(
DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
}
@@ -6229,7 +6229,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
for (i = 0; i < VTNumElts; ++i)
Ops[i] =
[llvm-branch-commits] [llvm] [AArch64][SME] Support saving/restoring ZT0 in the MachineSMEABIPass (PR #166362)
https://github.com/MacDue updated
https://github.com/llvm/llvm-project/pull/166362
>From 61a5390345e13e8195ad9b2214133914db560ef2 Mon Sep 17 00:00:00 2001
From: Benjamin Maxwell
Date: Mon, 3 Nov 2025 15:41:49 +
Subject: [PATCH] [AArch64][SME] Support saving/restoring ZT0 in the
MachineSMEABIPass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch extends the MachineSMEABIPass to support ZT0. This is done
with the addition of two new states:
- `ACTIVE_ZT0_SAVED`
* This is used when calling a function that shares ZA, but does
share ZT0 (i.e., no ZT0 attributes).
* This state indicates ZT0 must be saved to the save slot, but
must remain on, with no lazy save setup
- `LOCAL_COMMITTED`
* This is used for saving ZT0 in functions without ZA state.
* This state indicates ZA is off and ZT0 has been saved.
* This state is general enough to support ZA, but those
have not been implemented†
To aid with readability, the state transitions have been reworked to a
switch of `transitionFrom().to()`, rather than
nested ifs, which helps manage more transitions.
† This could be implemented to handle some cases of undefined behavior
better.
Change-Id: I14be4a7f8b998fe667bfaade5088f88039515f91
---
.../AArch64/AArch64ExpandPseudoInsts.cpp | 1 +
.../Target/AArch64/AArch64ISelLowering.cpp| 11 +-
.../lib/Target/AArch64/AArch64SMEInstrInfo.td | 6 +
llvm/lib/Target/AArch64/MachineSMEABIPass.cpp | 176 +++---
.../test/CodeGen/AArch64/sme-peephole-opts.ll | 4 -
.../test/CodeGen/AArch64/sme-za-exceptions.ll | 124 +---
llvm/test/CodeGen/AArch64/sme-zt0-state.ll| 104 ++-
7 files changed, 321 insertions(+), 105 deletions(-)
diff --git a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
index 34d74d04c4419..60e6a82d41cc8 100644
--- a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
@@ -1717,6 +1717,7 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
}
case AArch64::InOutZAUsePseudo:
case AArch64::RequiresZASavePseudo:
+ case AArch64::RequiresZT0SavePseudo:
case AArch64::SMEStateAllocPseudo:
case AArch64::COALESCER_BARRIER_FPR16:
case AArch64::COALESCER_BARRIER_FPR32:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c4ae8ea7a8a69..6dc01597cf0f5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9524,6 +9524,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
if (CallAttrs.requiresLazySave() ||
CallAttrs.requiresPreservingAllZAState())
ZAMarkerNode = AArch64ISD::REQUIRES_ZA_SAVE;
+else if (CallAttrs.requiresPreservingZT0())
+ ZAMarkerNode = AArch64ISD::REQUIRES_ZT0_SAVE;
else if (CallAttrs.caller().hasZAState() ||
CallAttrs.caller().hasZT0State())
ZAMarkerNode = AArch64ISD::INOUT_ZA_USE;
@@ -9643,7 +9645,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
SDValue ZTFrameIdx;
MachineFrameInfo &MFI = MF.getFrameInfo();
- bool ShouldPreserveZT0 = CallAttrs.requiresPreservingZT0();
+ bool ShouldPreserveZT0 =
+ !UseNewSMEABILowering && CallAttrs.requiresPreservingZT0();
// If the caller has ZT0 state which will not be preserved by the callee,
// spill ZT0 before the call.
@@ -9656,7 +9659,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
// If caller shares ZT0 but the callee is not shared ZA, we need to stop
// PSTATE.ZA before the call if there is no lazy-save active.
- bool DisableZA = CallAttrs.requiresDisablingZABeforeCall();
+ bool DisableZA =
+ !UseNewSMEABILowering && CallAttrs.requiresDisablingZABeforeCall();
assert((!DisableZA || !RequiresLazySave) &&
"Lazy-save should have PSTATE.SM=1 on entry to the function");
@@ -10142,7 +10146,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
getSMToggleCondition(CallAttrs));
}
- if (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall())
+ if (!UseNewSMEABILowering &&
+ (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall()))
// Unconditionally resume ZA.
Result = DAG.getNode(
AArch64ISD::SMSTART, DL, DAG.getVTList(MVT::Other, MVT::Glue), Result,
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index 737169253ddb3..b099f15ecf7e3 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -102,6 +102,7 @@ def : Pat<(i64 (AArch64AllocateSMESaveBuffer GPR64:$size)),
let hasSideEffects = 1, isMeta = 1 in {
def InOutZAUsePseudo : Pseudo<(outs), (ins), []>, Sched<[]>;
def RequiresZASavePseudo : Pseudo<(outs), (ins), []>, Sched<[]>;
+ d
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -549,15 +473,16 @@ createFlowFunction(const
BinaryFunction::BasicBlockOrderType &BlockOrder) {
/// of the basic blocks in the binary, the count is "matched" to the block.
/// Similarly, if both the source and the target of a count in the profile are
/// matched to a jump in the binary, the count is recorded in CFG.
-size_t matchWeights(
-BinaryContext &BC, const BinaryFunction::BasicBlockOrderType &BlockOrder,
-const yaml::bolt::BinaryFunctionProfile &YamlBF, FlowFunction &Func,
-HashFunction HashFunction, YAMLProfileReader::ProfileLookupMap &IdToYamlBF,
-const BinaryFunction &BF,
-const ArrayRef ProbeMatchSpecs);
+size_t matchWeights(BinaryContext &BC,
maksfb wrote:
`BinaryContext` can be obtained from `BinaryFunction`.
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][PAC] Warn about synchronous unwind tables (PR #165227)
https://github.com/bgergely0 updated
https://github.com/llvm/llvm-project/pull/165227
From 61e03b5abf74bd5a61f2aa4d21219c67cfbfce24 Mon Sep 17 00:00:00 2001
From: Gergely Balint
Date: Mon, 27 Oct 2025 09:29:54 +
Subject: [PATCH 1/4] [BOLT][PAC] Warn about synchronous unwind tables
BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.
See also: #165215
---
bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp| 9 -
.../AArch64/pacret-synchronous-unwind.cpp | 33 +++
2 files changed, 41 insertions(+), 1 deletion(-)
create mode 100644 bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 91030544d2b88..01af88818a21d 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -133,11 +133,18 @@ Error
PointerAuthCFIAnalyzer::runOnFunctions(BinaryContext &BC) {
ParallelUtilities::runOnEachFunction(
BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
SkipPredicate, "PointerAuthCFIAnalyzer");
+
+ float IgnoredPercent = (100.0 * FunctionsIgnored) / Total;
BC.outs() << "BOLT-INFO: PointerAuthCFIAnalyzer ran on " << Total
<< " functions. Ignored " << FunctionsIgnored << " functions "
-<< format("(%.2lf%%)", (100.0 * FunctionsIgnored) / Total)
+<< format("(%.2lf%%)", IgnoredPercent)
<< " because of CFI inconsistencies\n";
+ if (IgnoredPercent >= 10.0)
+BC.outs() << "BOLT-WARNING: PointerAuthCFIAnalyzer only supports "
+ "asynchronous unwind tables. For C compilers, see "
+ "-fasynchronous-unwind-tables.\n";
+
return Error::success();
}
diff --git a/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
new file mode 100644
index 0..1bfeeaed3715a
--- /dev/null
+++ b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
@@ -0,0 +1,33 @@
+// Test to demonstrate that functions compiled with synchronous unwind tables
+// are ignored by the PointerAuthCFIAnalyzer.
+// Exception handling is needed to have _any_ unwind tables, otherwise the
+// PointerAuthCFIAnalyzer does not run on these functions, so it does not
ignore
+// any function.
+//
+// REQUIRES: system-linux,bolt-runtime
+//
+// RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+// RUN: -mbranch-protection=pac-ret \
+// RUN: -fno-asynchronous-unwind-tables \
+// RUN: %s -o %t.exe -Wl,-q
+// RUN: llvm-bolt %t.exe -o %t.bolt | FileCheck %s --check-prefix=CHECK
+//
+// CHECK: PointerAuthCFIAnalyzer ran on 3 functions. Ignored
+// CHECK-NOT: 0 functions (0.00%) because of CFI inconsistencies
+// CHECK-SAME: 1 functions (33.33%) because of CFI inconsistencies
+// CHECK-NEXT: BOLT-WARNING: PointerAuthCFIAnalyzer only supports asynchronous
+// CHECK-SAME: unwind tables. For C compilers, see
-fasynchronous-unwind-tables.
+
+#include
+#include
+
+void foo() { throw std::runtime_error("Exception from foo()."); }
+
+int main() {
+ try {
+foo();
+ } catch (const std::exception &e) {
+printf("Exception caught: %s\n", e.what());
+ }
+ return 0;
+}
From 7fc8acdbf4cef2aa7f4f5ca9d136d4cb1bce9fe6 Mon Sep 17 00:00:00 2001
From: Gergely Balint
Date: Tue, 28 Oct 2025 09:23:08 +
Subject: [PATCH 2/4] [BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer
---
bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp | 27 ++
bolt/test/AArch64/pacret-cfi-incorrect.s | 2 +-
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 01af88818a21d..5979d5fb01818 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -28,6 +28,10 @@
using namespace llvm;
+namespace opts {
+extern llvm::cl::opt Verbosity;
+} // namespace opts
+
namespace llvm {
namespace bolt {
@@ -43,9 +47,10 @@ bool PointerAuthCFIAnalyzer::runOnFunction(BinaryFunction
&BF) {
// Not all functions have .cfi_negate_ra_state in them. But if one
does,
// we expect psign/pauth instructions to have the hasNegateRAState
// annotation.
-BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
- << BF.getPrintName()
- << ": ptr sign/auth inst without .cfi_negate_ra_state\n";
+if (opts::Verbosity >= 1)
+ BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+<< BF.getPrintName()
+<< ": ptr sign/auth inst without .cfi_negate_ra_state\n";
std::lock_guard Lock(IgnoreMutex);
BF.setIgnored();
return false;
@@ -65,9 +70,10 @@ bool PointerAuthCFIAnalyzer::runOnF
[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/168380 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
+ auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+ if (Probes.empty())
+return nullptr;
+ const MCDecodedPseudoProbe &Probe = *Probes.begin();
+ const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+ while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+ return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+ using namespace yaml::bolt;
+ BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+ const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+ using ChildMapTy =
+ std::unordered_map;
+ using CallSiteInfoTy =
+ std::unordered_map;
+ // Mapping from a parent node id to a map InlineSite -> Child node.
+ DenseMap ParentToChildren;
+ // Collect calls in the profile: map from a parent node id to a map
+ // InlineSite -> CallSiteInfo ptr.
+ DenseMap ParentToCSI;
+ for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+ ProbeMatchingStats.TotalCallCount += CSI.Count;
+ ++ProbeMatchingStats.TotalCallSites;
+ if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+ if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ // Get callee GUID
+ if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');
maksfb wrote:
```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no inline tree for " << Callee->Name
<< '\n');
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)
@@ -1438,8 +1443,7 @@ bool SIGfx6CacheControl::insertRelease(MachineBasicBlock::iterator &MI, } bool SIGfx10CacheControl::enableLoadCacheBypass( -const MachineBasicBlock::iterator &MI, -SIAtomicScope Scope, +const MachineBasicBlock::iterator &MI, SIAtomicScope Scope, Pierre-vh wrote: I didn't, I always use `git clang-format` so not sure why that changed. Would you like me to remove it? https://github.com/llvm/llvm-project/pull/168058 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
https://github.com/jmmartinez approved this pull request. https://github.com/llvm/llvm-project/pull/161815 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
@@ -5,29 +5,34 @@
// UNSUPPORTED: ios
// RUN: rm -rf %t && mkdir -p %t
-// RUN: cp `%clang_asan
-print-file-name=lib`/darwin/libclang_rt.asan_osx_dynamic.dylib \
+// RUN: %clang_asan -print-file-name=lib | tr -d '\n' > %t.lib_name
+// RUN: cp %{readfile:%t.lib_name}/darwin/libclang_rt.asan_osx_dynamic.dylib \
// RUN: %t/libclang_rt.asan_osx_dynamic.dylib
// RUN: %clangxx_asan %s -o %t/a.out
// RUN: %clangxx -DSHARED_LIB %s \
// RUN: -dynamiclib -o %t/dummy-so.dylib
-// RUN: ( cd %t && \
-// RUN:
DYLD_INSERT_LIBRARIES=@executable_path/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib
\
-// RUN: %run ./a.out 2>&1 ) | FileCheck %s || exit 1
-
-// RUN: ( cd %t && \
-// RUN:
DYLD_INSERT_LIBRARIES=libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN: %run ./a.out 2>&1 ) | FileCheck %s || exit 1
-
-// RUN: ( cd %t && \
-// RUN: %env_asan_opts=strip_env=0 \
-// RUN:
DYLD_INSERT_LIBRARIES=libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN: %run ./a.out 2>&1 ) | FileCheck %s --check-prefix=CHECK-KEEP || exit
1
-
-// RUN: ( cd %t && \
-// RUN:
DYLD_INSERT_LIBRARIES=%t/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN: %run ./a.out 2>&1 ) | FileCheck %s || exit 1
+// RUN: pushd %t
+// RUN: env
DYLD_INSERT_LIBRARIES=@executable_path/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib
\
+// RUN: %run ./a.out 2>&1 | FileCheck %s
+// RUN: popd
DanBlackwell wrote:
NIT: I'm missing context as to why this was done the way it was in the original
code, but it seems the popd-pushd here are redundant.
https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
github-actions[bot] wrote: # :penguin: Linux x64 Test Results * 5820 tests passed * 1319 tests skipped https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
https://github.com/arsenm ready_for_review https://github.com/llvm/llvm-project/pull/168290 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
github-actions[bot] wrote: # :penguin: Linux x64 Test Results * 186276 tests passed * 4848 tests skipped https://github.com/llvm/llvm-project/pull/168411 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)
Gergely =?utf-8?q?B=C3=A1lint?= , Gergely =?utf-8?q?B=C3=A1lint?= ,Gergely Balint ,Gergely Balint ,Gergely Balint Message-ID: In-Reply-To: https://github.com/paschalis-mpeis approved this pull request. https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/168380
Backport 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868
Requested by: @davemgreen
>From 7c585c9c8b7fb78d8107912de47bbd35e8379f7c Mon Sep 17 00:00:00 2001
From: David Green
Date: Wed, 12 Nov 2025 16:26:21 +
Subject: [PATCH] [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter
(#166329)
The subtarget may not be set if no functions are present in the module.
Attempt to use the TargetMachine directly in more cases.
Fixes #165422
Fixes #167577
(cherry picked from commit 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868)
---
llvm/lib/Target/ARM/ARMAsmPrinter.cpp | 21 +++--
llvm/lib/Target/ARM/ARMSubtarget.cpp | 12 +---
llvm/lib/Target/ARM/ARMTargetMachine.h| 14 ++
llvm/test/CodeGen/ARM/xxstructor-nodef.ll | 7 +++
4 files changed, 33 insertions(+), 21 deletions(-)
create mode 100644 llvm/test/CodeGen/ARM/xxstructor-nodef.ll
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 850b00406f09e..aa6ef55dad26c 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -97,7 +97,8 @@ void ARMAsmPrinter::emitXXStructor(const DataLayout &DL,
const Constant *CV) {
const MCExpr *E = MCSymbolRefExpr::create(
GetARMGVSymbol(GV, ARMII::MO_NO_FLAG),
- (Subtarget->isTargetELF() ? ARM::S_TARGET1 : ARM::S_None), OutContext);
+ (TM.getTargetTriple().isOSBinFormatELF() ? ARM::S_TARGET1 : ARM::S_None),
+ OutContext);
OutStreamer->emitValue(E, Size);
}
@@ -595,8 +596,7 @@ void ARMAsmPrinter::emitEndOfAsmFile(Module &M) {
ARMTargetStreamer &ATS = static_cast(TS);
if (OptimizationGoals > 0 &&
- (Subtarget->isTargetAEABI() || Subtarget->isTargetGNUAEABI() ||
- Subtarget->isTargetMuslAEABI()))
+ (TT.isTargetAEABI() || TT.isTargetGNUAEABI() || TT.isTargetMuslAEABI()))
ATS.emitAttribute(ARMBuildAttrs::ABI_optimization_goals,
OptimizationGoals);
OptimizationGoals = -1;
@@ -866,9 +866,10 @@ static uint8_t getModifierSpecifier(ARMCP::ARMCPModifier
Modifier) {
MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue *GV,
unsigned char TargetFlags) {
- if (Subtarget->isTargetMachO()) {
+ const Triple &TT = TM.getTargetTriple();
+ if (TT.isOSBinFormatMachO()) {
bool IsIndirect =
-(TargetFlags & ARMII::MO_NONLAZY) && Subtarget->isGVIndirectSymbol(GV);
+(TargetFlags & ARMII::MO_NONLAZY) && getTM().isGVIndirectSymbol(GV);
if (!IsIndirect)
return getSymbol(GV);
@@ -885,9 +886,8 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue
*GV,
StubSym = MachineModuleInfoImpl::StubValueTy(getSymbol(GV),
!GV->hasInternalLinkage());
return MCSym;
- } else if (Subtarget->isTargetCOFF()) {
-assert(Subtarget->isTargetWindows() &&
- "Windows is the only supported COFF target");
+ } else if (TT.isOSBinFormatCOFF()) {
+assert(TT.isOSWindows() && "Windows is the only supported COFF target");
bool IsIndirect =
(TargetFlags & (ARMII::MO_DLLIMPORT | ARMII::MO_COFFSTUB));
@@ -914,7 +914,7 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue
*GV,
}
return MCSym;
- } else if (Subtarget->isTargetELF()) {
+ } else if (TT.isOSBinFormatELF()) {
return getSymbolPreferLocal(*GV);
}
llvm_unreachable("unexpected target");
@@ -960,7 +960,8 @@ void ARMAsmPrinter::emitMachineConstantPoolValue(
// On Darwin, const-pool entries may get the "FOO$non_lazy_ptr" mangling,
so
// flag the global as MO_NONLAZY.
-unsigned char TF = Subtarget->isTargetMachO() ? ARMII::MO_NONLAZY : 0;
+unsigned char TF =
+TM.getTargetTriple().isOSBinFormatMachO() ? ARMII::MO_NONLAZY : 0;
MCSym = GetARMGVSymbol(GV, TF);
} else if (ACPV->isMachineBasicBlock()) {
const MachineBasicBlock *MBB = cast(ACPV)->getMBB();
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp
b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 13185a7d797a3..63d6e2ea7389b 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -316,17 +316,7 @@ bool ARMSubtarget::isRWPI() const {
}
bool ARMSubtarget::isGVIndirectSymbol(const GlobalValue *GV) const {
- if (!TM.shouldAssumeDSOLocal(GV))
-return true;
-
- // 32 bit macho has no relocation for a-b if a is undefined, even if b is in
- // the section that is being relocated. This means we have to use o load even
- // for GVs that are known to be local to the dso.
- if (isTargetMachO() && TM.isPositionIndependent() &&
- (GV->isDeclarationForLinker() || GV->hasCommonLinkage()))
-return true;
-
- return false;
+ return TM.isGVIndirectSymbol(GV);
}
bool ARMSubtarget::isGVInGOT(const GlobalValue *GV) const {
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.h
b/llvm/lib/Target/ARM/ARMTa
[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)
https://github.com/Pierre-vh updated
https://github.com/llvm/llvm-project/pull/168058
>From 5700ad0a2fb2a859e7c46c6690854c35206155f0 Mon Sep 17 00:00:00 2001
From: pvanhout
Date: Mon, 17 Nov 2025 10:05:14 +0100
Subject: [PATCH 1/2] nit
>From e060c5eba50d75216d628e16da72929b71aa9a30 Mon Sep 17 00:00:00 2001
From: pvanhout
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH 2/2] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
Classes
+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.
With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
1 file changed, 38 insertions(+), 120 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 49aba39872138..bf04c7fa132c0 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
/// Generates code sequences for the memory model of all GFX targets below
/// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
public:
SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
Position Pos) const override;
};
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
public:
- SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+ SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace) const override;
+ bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
+ bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
- SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
- bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsVolatile, bool IsNonTemporal,
- bool IsLastUse) const override;
+ bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+ IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+ /*AtomicsOnly=*/false);
+ }
};
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
protected:
// Sets TH policy to \p Value if CPol operand is present in instruction \p
MI.
// \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
public:
- SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+ SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
// GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
// the behavior is the same if assuming GFX12.0 in CU mode.
assert(!ST.hasGFX1250Insts() || ST.isCuMode
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
https://github.com/clairechingching created
https://github.com/llvm/llvm-project/pull/168314
I'd like to backport this change to handle misaligned memory access in the BPF
target which was merged in [this original
PR](https://github.com/llvm/llvm-project/pull/167013). Backporting it so I can
enable this feature in the rust nightly computer
>From 5d2ec95c53bd510a39fd33ab234a961c91b69cd0 Mon Sep 17 00:00:00 2001
From: Claire xyz
Date: Fri, 7 Nov 2025 11:08:47 -0500
Subject: [PATCH] [BPF] add allows-misaligned-mem-access target feature
This enables misaligned memory access when the feature is enabled
---
llvm/lib/Target/BPF/BPF.td| 4 +
llvm/lib/Target/BPF/BPFISelLowering.cpp | 20 ++
llvm/lib/Target/BPF/BPFISelLowering.h | 7 +
llvm/lib/Target/BPF/BPFSubtarget.cpp | 1 +
llvm/lib/Target/BPF/BPFSubtarget.h| 6 +
llvm/test/CodeGen/BPF/unaligned_load_store.ll | 196 ++
6 files changed, 234 insertions(+)
create mode 100644 llvm/test/CodeGen/BPF/unaligned_load_store.ll
diff --git a/llvm/lib/Target/BPF/BPF.td b/llvm/lib/Target/BPF/BPF.td
index dff76ca07af51..a7aa6274f5ac1 100644
--- a/llvm/lib/Target/BPF/BPF.td
+++ b/llvm/lib/Target/BPF/BPF.td
@@ -27,6 +27,10 @@ def ALU32 : SubtargetFeature<"alu32", "HasAlu32", "true",
def DwarfRIS: SubtargetFeature<"dwarfris", "UseDwarfRIS", "true",
"Disable MCAsmInfo
DwarfUsesRelocationsAcrossSections">;
+def MisalignedMemAccess : SubtargetFeature<"allows-misaligned-mem-access",
+ "AllowsMisalignedMemAccess", "true",
+ "Allows misaligned memory access">;
+
def : Proc<"generic", []>;
def : Proc<"v1", []>;
def : Proc<"v2", []>;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp
b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index f4f414d192df0..5ec7f5905fd22 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -196,6 +196,26 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine
&TM,
HasJmp32 = STI.getHasJmp32();
HasJmpExt = STI.getHasJmpExt();
HasMovsx = STI.hasMovsx();
+
+ AllowsMisalignedMemAccess = STI.getAllowsMisalignedMemAccess();
+}
+
+bool BPFTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned, Align,
+
MachineMemOperand::Flags,
+ unsigned *Fast) const {
+ // allows-misaligned-mem-access is disabled
+ if (!AllowsMisalignedMemAccess)
+return false;
+
+ // only allow misalignment for simple value types
+ if (!VT.isSimple())
+return false;
+
+ // always assume fast mode when misalignment is allowed
+ if (Fast)
+*Fast = true;
+
+ return true;
}
bool BPFTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA)
const {
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h
b/llvm/lib/Target/BPF/BPFISelLowering.h
index 8f60261c10e9e..fe01bd5b8cf85 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -46,6 +46,10 @@ class BPFTargetLowering : public TargetLowering {
// with the given GlobalAddress is legal.
bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
+ bool allowsMisalignedMemoryAccesses(EVT VT, unsigned, Align,
+ MachineMemOperand::Flags,
+ unsigned *) const override;
+
BPFTargetLowering::ConstraintType
getConstraintType(StringRef Constraint) const override;
@@ -73,6 +77,9 @@ class BPFTargetLowering : public TargetLowering {
bool HasJmpExt;
bool HasMovsx;
+ // Allows Misalignment
+ bool AllowsMisalignedMemAccess;
+
SDValue LowerSDIVSREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp
b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 4167547680b12..925537710efb0 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -66,6 +66,7 @@ void BPFSubtarget::initializeEnvironment() {
HasGotol = false;
HasStoreImm = false;
HasLoadAcqStoreRel = false;
+ AllowsMisalignedMemAccess = false;
}
void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.h
b/llvm/lib/Target/BPF/BPFSubtarget.h
index aed2211265e23..a9a20008733c9 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.h
+++ b/llvm/lib/Target/BPF/BPFSubtarget.h
@@ -63,6 +63,9 @@ class BPFSubtarget : public BPFGenSubtargetInfo {
// whether we should enable MCAsmInfo DwarfUsesRelocationsAcrossSections
bool UseDwarfRIS;
+ // whether we allows misaligned memory access
+ bool AllowsMisalignedMemAccess;
+
// whether cpu v4 insns are enabled.
bool HasLdsx, HasMo
[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)
https://github.com/bgergely0 edited https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint with fp->int->fp optimizations (PR #164503)
@@ -6075,6 +6075,35 @@ bool SelectionDAG::isKnownNeverZeroFloat(SDValue Op)
const {
Op, [](ConstantFPSDNode *C) { return !C->isZero(); });
}
+bool SelectionDAG::allUsesSignedZeroInsensitive(SDValue Op) const {
+ assert(Op.getValueType().isFloatingPoint());
+ return all_of(Op->uses(), [&](SDUse &Use) {
guy-david wrote:
Sounds good, limiting it to two uses for now. I will look into implementing it
via demanded-bits in the near future.
Moving the SelectionDAG patch to
https://github.com/llvm/llvm-project/pull/165011 because I don't want it to be
tightly coupled to the fp-to-int-to-fp optimization.
https://github.com/llvm/llvm-project/pull/164503
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)
dyung wrote: Hi, at this point in the 21.x release branch we are only accepting patches that fix regressions or major issues. Was the problem being fixed here a recent regression? From a quick look at the history, the code being replaced was introduced around the LLVM 18 time frame, so it has been around for a while. What are the implications if we do not accept this change into the 21.x release branch? Would something be broken that cannot be worked around or otherwise fixed without it? At this point, I am leaning towards not including the fix and waiting for LLVM 22 for it, but if you feel strongly that it should be included, please let us know why and I can consult with the other release managers to see how they feel on the issue. https://github.com/llvm/llvm-project/pull/168371 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (PR #166361)
@@ -356,20 +356,13 @@ define void @new_za_zt0_caller(ptr %callee)
"aarch64_new_za" "aarch64_new_zt0" n
; Expect clear ZA on entry
define void @new_za_shared_zt0_caller(ptr %callee) "aarch64_new_za"
"aarch64_in_zt0" nounwind {
-; CHECK-LABEL: new_za_shared_zt0_caller:
-; CHECK: // %bb.0:
-; CHECK-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:zero {za}
-; CHECK-NEXT:blr x0
-; CHECK-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:ret
-;
-; CHECK-NEWLOWERING-LABEL: new_za_shared_zt0_caller:
-; CHECK-NEWLOWERING: // %bb.0:
-; CHECK-NEWLOWERING-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEWLOWERING-NEXT:blr x0
sdesmalen-arm wrote:
Why wasn't ZA zeroed before?
https://github.com/llvm/llvm-project/pull/166361
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
https://github.com/petar-avramovic ready_for_review https://github.com/llvm/llvm-project/pull/168411 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)
https://github.com/Pierre-vh updated
https://github.com/llvm/llvm-project/pull/168058
>From f0a60702ef1dba4a3545848ff4791fceda7abc51 Mon Sep 17 00:00:00 2001
From: pvanhout
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
Classes
+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.
With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
1 file changed, 38 insertions(+), 120 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 8d27084cf72d9..eddd4a3bafe2e 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
/// Generates code sequences for the memory model of all GFX targets below
/// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
public:
SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
Position Pos) const override;
};
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
public:
- SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+ SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace) const override;
+ bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
+ bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
- SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
- bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsVolatile, bool IsNonTemporal,
- bool IsLastUse) const override;
+ bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+ IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+ /*AtomicsOnly=*/false);
+ }
};
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
protected:
// Sets TH policy to \p Value if CPol operand is present in instruction \p
MI.
// \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
public:
- SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+ SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
// GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
// the behavior is the same if assuming GFX12.0 in CU mode.
assert(!ST.hasGFX1250Insts() || ST.isCuModeEnabled());
@@ -915,10 +922,8 @@ std::unique_ptr
SICacheControl::create(const GCNSubtarget &ST) {
GCNSubtarget::Generation Generation = ST.getGeneration(
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
+ auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+ if (Probes.empty())
+return nullptr;
+ const MCDecodedPseudoProbe &Probe = *Probes.begin();
+ const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+ while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+ return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+ using namespace yaml::bolt;
+ BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+ const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+ using ChildMapTy =
+ std::unordered_map;
+ using CallSiteInfoTy =
+ std::unordered_map;
+ // Mapping from a parent node id to a map InlineSite -> Child node.
+ DenseMap ParentToChildren;
+ // Collect calls in the profile: map from a parent node id to a map
+ // InlineSite -> CallSiteInfo ptr.
+ DenseMap ParentToCSI;
+ for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+ ProbeMatchingStats.TotalCallCount += CSI.Count;
+ ++ProbeMatchingStats.TotalCallSites;
+ if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
maksfb wrote:
```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no probe for " << CSI.DestId << " "
<< CSI.Count
```
https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)
https://github.com/bgergely0 edited https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT]Rename Pointer Auth DWARF rewriter passes (PR #164622)
https://github.com/bgergely0 edited https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)
https://github.com/guy-david updated
https://github.com/llvm/llvm-project/pull/165011
>From 01e872d95c1708392ae429879f36f6a32ca4889a Mon Sep 17 00:00:00 2001
From: Guy David
Date: Fri, 24 Oct 2025 19:30:19 +0300
Subject: [PATCH] [DAGCombiner] Relax nsz constraint for FP optimizations
Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guaranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).
The optimizations affected:
- fadd x, +0.0 -> x
- fsub x, -0.0 -> x
- fsub +0.0, x -> fneg x
- fdiv(x, sqrt(x)) -> sqrt(x)
- frem lowering with power-of-2 divisors
---
llvm/include/llvm/CodeGen/SelectionDAG.h | 6 ++
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 17 +++--
.../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 40 +++
.../CodeGen/AArch64/ignore-signed-zero.ll | 72 +++
.../AMDGPU/fcanonicalize-elimination.ll | 2 +-
llvm/test/CodeGen/AMDGPU/swdev380865.ll | 5 +-
6 files changed, 132 insertions(+), 10 deletions(-)
create mode 100644 llvm/test/CodeGen/AArch64/ignore-signed-zero.ll
diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index b024e8a68bd6e..9dba2ee8692f5 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -2326,6 +2326,12 @@ class SelectionDAG {
/// +nan are considered positive, -0.0, -inf and -nan are not.
LLVM_ABI bool cannotBeOrderedNegativeFP(SDValue Op) const;
+ /// Check if a use of a float value is insensitive to signed zeros.
+ LLVM_ABI bool canIgnoreSignBitOfZero(const SDUse &Use) const;
+
+ /// Check if at most two uses of a value are insensitive to signed zeros.
+ LLVM_ABI bool canIgnoreSignBitOfZero(SDValue Op) const;
+
/// Test whether two SDValues are known to compare equal. This
/// is true if they are the same value, or if one is negative zero and the
/// other positive zero.
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c9513611e6dcb..3624748a3b0f0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17869,7 +17869,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
// N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
if (N1C && N1C->isZero())
-if (N1C->isNegative() || Flags.hasNoSignedZeros())
+if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0)))
return N0;
if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -18081,7 +18082,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
// (fsub A, 0) -> A
if (N1CFP && N1CFP->isZero()) {
-if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0))) {
return N0;
}
}
@@ -18094,7 +18096,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
// (fsub -0.0, N1) -> -N1
if (N0CFP && N0CFP->isZero()) {
-if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0))) {
// We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
// flushed to zero, unless all users treat denorms as zero (DAZ).
// FIXME: This transform will change the sign of a NaN and the behavior
@@ -18744,7 +18747,8 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
}
// Fold X/Sqrt(X) -> Sqrt(X)
- if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+ if ((Flags.hasNoSignedZeros() || DAG.canIgnoreSignBitOfZero(SDValue(N, 0)))
&&
+ Flags.hasAllowReassociation())
if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
return N1;
@@ -18795,8 +18799,9 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
TLI.isOperationLegalOrCustom(ISD::FDIV, VT) &&
TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
DAG.isKnownToBeAPowerOfTwoFP(N1)) {
-bool NeedsCopySign =
-!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+bool NeedsCopySign = !Flags.hasNoSignedZeros() &&
+ !DAG.cannotBeOrderedNegativeFP(N0) &&
+ !DAG.canIgnoreSignBitOfZero(SDValue(N, 0));
SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
SDValue MLA;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c2b4c19846316..64fd925684ffa 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
https://github.com/clairechingching edited https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [clang-tools-extra] [flang] [libcxx] [lldb] [llvm] [mlir] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)
https://github.com/guy-david updated https://github.com/llvm/llvm-project/pull/165011 error: too big or took too long to generate ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)
https://github.com/Pierre-vh updated
https://github.com/llvm/llvm-project/pull/168058
>From 5700ad0a2fb2a859e7c46c6690854c35206155f0 Mon Sep 17 00:00:00 2001
From: pvanhout
Date: Mon, 17 Nov 2025 10:05:14 +0100
Subject: [PATCH 1/2] nit
>From e060c5eba50d75216d628e16da72929b71aa9a30 Mon Sep 17 00:00:00 2001
From: pvanhout
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH 2/2] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
Classes
+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.
With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
1 file changed, 38 insertions(+), 120 deletions(-)
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 49aba39872138..bf04c7fa132c0 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
/// Generates code sequences for the memory model of all GFX targets below
/// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
public:
SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
Position Pos) const override;
};
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
public:
- SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+ SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace) const override;
+ bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+ SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
+ bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+ }
+
bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
SIAtomicAddrSpace AddrSpace, SIMemOp Op,
bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
- SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
- bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
- bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
- SIAtomicAddrSpace AddrSpace, SIMemOp Op,
- bool IsVolatile, bool IsNonTemporal,
- bool IsLastUse) const override;
+ bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+ IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+ /*AtomicsOnly=*/false);
+ }
};
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
protected:
// Sets TH policy to \p Value if CPol operand is present in instruction \p
MI.
// \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
public:
- SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+ SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
// GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
// the behavior is the same if assuming GFX12.0 in CU mode.
assert(!ST.hasGFX1250Insts() || ST.isCuMode
[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)
@@ -592,72 +633,276 @@ size_t
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
return MatchedWithCallGraph;
}
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
- // Match inline tree nodes by GUID, checksum, parent, and call site.
- for (const auto &[InlineTreeNodeId, InlineTreeNode] :
- llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
- Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
- getInlineTreeNode(ParentId)) {
- for (const MCDecodedPseudoProbeInlineTree &Child :
- Parent->getChildren()) {
-if (Child.Guid == GUID) {
- if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
- break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+ const BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+ assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+ uint64_t Addr = BF.getAddress();
+ uint64_t Size = BF.getSize();
+ auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+ if (Probes.empty())
+return nullptr;
+ const MCDecodedPseudoProbe &Probe = *Probes.begin();
+ const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+ while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+ return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+ using namespace yaml::bolt;
+ BinaryContext &BC = BF.getBinaryContext();
+ const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+ const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+ using ChildMapTy =
+ std::unordered_map;
+ using CallSiteInfoTy =
+ std::unordered_map;
+ // Mapping from a parent node id to a map InlineSite -> Child node.
+ DenseMap ParentToChildren;
+ // Collect calls in the profile: map from a parent node id to a map
+ // InlineSite -> CallSiteInfo ptr.
+ DenseMap ParentToCSI;
+ for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+ ProbeMatchingStats.TotalCallCount += CSI.Count;
+ ++ProbeMatchingStats.TotalCallSites;
+ if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+ if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+ << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ // Get callee GUID
+ if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');
+++ProbeMatchingStats.MissingInlineTree;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+ }
+ uint64_t CalleeGUID = Callee->InlineTree.front().GUID;
+ ParentToCSI[CSI.InlineTreeNode][InlineSite(CalleeGUID, CSI.Probe)] =
&CSI;
+}
+ }
+ LLVM_DEBUG({
+for (auto &[ParentId, InlineSiteCSI] : ParentToCSI) {
+ for (auto &[InlineSite, CSI] : InlineSiteCSI) {
+auto [CalleeGUID, CallSite] = InlineSite;
+errs() << ParentId << "@" << CallSite << "->"
+ << Twine::utohexstr(CalleeGUID) << ": " << CSI->Count << ", "
+ << Twine::utohexstr(CSI->Offset) << '\n';
+ }
+}
+ });
+
+ assert(!Root.isRoot());
+ LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@"
+<< Twine::utohexstr(Root.Guid) << " and " << YamlBF.Name
+<< "@" << Twine::utohexstr(FuncNode.GUID) << '\n');
+ ++ProbeMatchingStats.AttemptedNodes;
+ ++ProbeMatchingStats.AttemptedRoots;
+
+ // Match profile function with a lead node (top-level function or inlinee)
+ if (Root.Guid != FuncNode.GUID) {
+LLVM_DEBUG(dbgs() << "
[llvm-branch-commits] [lld] release/21.x: [LLD][COFF] Align EC code ranges to page boundaries (#168222) (PR #168369)
https://github.com/llvmbot created
https://github.com/llvm/llvm-project/pull/168369
Backport af45b0202cdd443beedb02392f653d8cff5bd931
Requested by: @cjacek
>From fb641d8e566da6cf431398e85faa1254914751ed Mon Sep 17 00:00:00 2001
From: Jacek Caban
Date: Mon, 17 Nov 2025 12:44:22 +0100
Subject: [PATCH] [LLD][COFF] Align EC code ranges to page boundaries (#168222)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We already ensure that code for different architectures is always placed
in different pages in `assignAddresses`. We represent those ranges using
their first and last chunks. However, the RVAs of those chunks may not
be page-aligned, for example, due to extra padding for entry-thunk
offsets. Align the chunk RVAs to the page boundary so that the emitted
ranges correctly include the entire region.
This change affects an existing test that checks corner cases triggered
by merging a data section into a code section. We may now include such
data in the code range. This differs from MSVC’s behavior, but it should
not cause practical issues, and the new behavior is arguably more
correct.
Fixes #168119.
(cherry picked from commit af45b0202cdd443beedb02392f653d8cff5bd931)
---
lld/COFF/Chunks.cpp| 2 +-
lld/test/COFF/arm64ec-codemap.test | 36 +++---
2 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index 01752cdc6a9da..cfb33daa024a7 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -939,7 +939,7 @@ void ECCodeMapChunk::writeTo(uint8_t *buf) const {
auto table = reinterpret_cast(buf);
for (uint32_t i = 0; i < map.size(); i++) {
const ECCodeMapEntry &entry = map[i];
-uint32_t start = entry.first->getRVA();
+uint32_t start = entry.first->getRVA() & ~0xfff;
table[i].StartOffset = start | entry.type;
table[i].Length = entry.last->getRVA() + entry.last->getSize() - start;
}
diff --git a/lld/test/COFF/arm64ec-codemap.test
b/lld/test/COFF/arm64ec-codemap.test
index 050261117be2e..bbc682d19920f 100644
--- a/lld/test/COFF/arm64ec-codemap.test
+++ b/lld/test/COFF/arm64ec-codemap.test
@@ -7,6 +7,7 @@ RUN: llvm-mc -filetype=obj -triple=arm64ec-windows
arm64ec-func-sym2.s -o arm64e
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec.s -o data-sec.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec2.s -o data-sec2.obj
RUN: llvm-mc -filetype=obj -triple=arm64ec-windows empty-sec.s -o
arm64ec-empty-sec.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows entry-thunk.s -o
entry-thunk.obj
RUN: llvm-mc -filetype=obj -triple=x86_64-windows x86_64-func-sym.s -o
x86_64-func-sym.obj
RUN: llvm-mc -filetype=obj -triple=x86_64-windows empty-sec.s -o
x86_64-empty-sec.obj
RUN: llvm-mc -filetype=obj -triple=aarch64-windows
%S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
@@ -162,15 +163,17 @@ RUN: loadconfig-arm64ec.obj -dll -noentry
-merge:test=.testdata -merge:
RUN: llvm-readobj --coff-load-config testcm.dll | FileCheck
-check-prefix=CODEMAPCM %s
CODEMAPCM: CodeMap [
-CODEMAPCM-NEXT: 0x4008 - 0x4016 X64
+CODEMAPCM-NEXT: 0x4000 - 0x4016 X64
CODEMAPCM-NEXT: ]
RUN: llvm-objdump -d testcm.dll | FileCheck -check-prefix=DISASMCM %s
DISASMCM: Disassembly of section .testdat:
DISASMCM-EMPTY:
DISASMCM-NEXT: 000180004000 <.testdat>:
-DISASMCM-NEXT: 180004000: 0001 udf #0x1
-DISASMCM-NEXT: 180004004: udf #0x0
+DISASMCM-NEXT: 180004000: 01 00addl %eax, (%rax)
+DISASMCM-NEXT: 180004002: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004004: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004006: 00 00addb %al, (%rax)
DISASMCM-NEXT: 180004008: b8 03 00 00 00 movl$0x3, %eax
DISASMCM-NEXT: 18000400d: c3 retq
DISASMCM-NEXT: 18000400e: 00 00addb%al, (%rax)
@@ -207,6 +210,14 @@ DISASMMS-NEXT: 000180006000 :
DISASMMS-NEXT: 180006000: 528000a0 mov w0, #0x5// =5
DISASMMS-NEXT: 180006004: d65f03c0 ret
+Test the code map that includes an ARM64EC function padded by its entry-thunk
offset.
+
+RUN: lld-link -out:testpad.dll -machine:arm64ec entry-thunk.obj
loadconfig-arm64ec.obj -dll -noentry -include:func
+RUN: llvm-readobj --coff-load-config testpad.dll | FileCheck
-check-prefix=CODEMAPPAD %s
+CODEMAPPAD: CodeMap [
+CODEMAPPAD:0x1000 - 0x1010 ARM64EC
+CODEMAPPAD-NEXT: ]
+
#--- arm64-func-sym.s
.text
@@ -266,3 +277,22 @@ x86_64_func_sym2:
.section .empty1, "xr"
.section .empty2, "xr"
.section .empty3, "xr"
+
+#--- entry-thunk.s
+.section .text,"xr",discard,func
+.globl func
+.p2align 2, 0x0
+func:
+mov w0, #1
+ret
+
+.section .wowthk$aa,"xr",discard,thunk
+.globl thunk
+.p2align 2
+thunk:
+ret
+
+
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
petar-avramovic wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#168411** https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#168410** https://app.graphite.com/github/pr/llvm/llvm-project/168410?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/168411 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] 6922f8a - Revert "[clang][SourceManager] Use `getFileLoc` when computing `getPresumedLo…"
Author: Aaron Ballman
Date: 2025-11-17T08:44:37-05:00
New Revision: 6922f8a3b0f75be79ae26b8b8831512d8de43b58
URL:
https://github.com/llvm/llvm-project/commit/6922f8a3b0f75be79ae26b8b8831512d8de43b58
DIFF:
https://github.com/llvm/llvm-project/commit/6922f8a3b0f75be79ae26b8b8831512d8de43b58.diff
LOG: Revert "[clang][SourceManager] Use `getFileLoc` when computing
`getPresumedLo…"
This reverts commit 6b464e4ac0b1ce4638c0fa07abcba329119836cb.
Added:
Modified:
clang/include/clang/Basic/SourceManager.h
clang/lib/Basic/SourceManager.cpp
clang/test/Analysis/plist-macros-with-expansion.cpp
clang/test/C/C23/n2350.c
clang/test/ExtractAPI/macro_undefined.c
clang/test/FixIt/format.cpp
clang/test/Preprocessor/macro_arg_directive.c
clang/test/Preprocessor/print_line_track.c
Removed:
diff --git a/clang/include/clang/Basic/SourceManager.h
b/clang/include/clang/Basic/SourceManager.h
index f15257a760b8c..bc9e97863556d 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -1464,9 +1464,8 @@ class SourceManager : public
RefCountedBase {
/// directives. This provides a view on the data that a user should see
/// in diagnostics, for example.
///
- /// If \p Loc is a macro expansion location, the presumed location
- /// computation uses the spelling location for macro arguments and the
- /// expansion location for other macro expansions.
+ /// Note that a presumed location is always given as the expansion point of
+ /// an expansion location, not at the spelling location.
///
/// \returns The presumed location of the specified SourceLocation. If the
/// presumed location cannot be calculated (e.g., because \p Loc is invalid
diff --git a/clang/lib/Basic/SourceManager.cpp
b/clang/lib/Basic/SourceManager.cpp
index 767a765ae4261..b6cc6ec9365f5 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -1435,7 +1435,7 @@ PresumedLoc SourceManager::getPresumedLoc(SourceLocation
Loc,
if (Loc.isInvalid()) return PresumedLoc();
// Presumed locations are always for expansion points.
- FileIDAndOffset LocInfo = getDecomposedLoc(getFileLoc(Loc));
+ FileIDAndOffset LocInfo = getDecomposedExpansionLoc(Loc);
bool Invalid = false;
const SLocEntry &Entry = getSLocEntry(LocInfo.first, &Invalid);
diff --git a/clang/test/Analysis/plist-macros-with-expansion.cpp
b/clang/test/Analysis/plist-macros-with-expansion.cpp
index d9a2f94055593..d57bb0f2dd265 100644
--- a/clang/test/Analysis/plist-macros-with-expansion.cpp
+++ b/clang/test/Analysis/plist-macros-with-expansion.cpp
@@ -405,14 +405,14 @@ void commaInBracketsTest() {
code
void commaInBracesTest() {
- PASTE_CODE({
+ PASTE_CODE({ // expected-warning{{Dereference of null pointer}}
// NOTE: If we were to add a new variable here after a comma, we'd get a
// compilation error, so this test is mainly here to show that this was
also
// investigated.
//
// int *ptr = nullptr, a;
int *ptr = nullptr;
-*ptr = 5; // expected-warning{{Dereference of null pointer}}
+*ptr = 5;
})
}
@@ -425,14 +425,14 @@ void commaInBracesTest() {
// CHECK-NEXT: col3
// CHECK-NEXT: file0
// CHECK-NEXT:
-// CHECK-NEXT: namePASTE_CODE({
+// CHECK-NEXT: namePASTE_CODE({ // expected-
// CHECK-NEXT:// NOTE: If we were to add a new variable here after a
comma, we'd get a
// CHECK-NEXT:// compilation error, so this test is mainly here to show
that this was also
// CHECK-NEXT:// investigated.
// CHECK-NEXT://
// CHECK-NEXT:// int *ptr = nullptr, a;
// CHECK-NEXT:int *ptr = nullptr;
-// CHECK-NEXT:*ptr = 5; // expected-
+// CHECK-NEXT:*ptr = 5;
// CHECK-NEXT: })
// CHECK-NEXT: expansion{int *ptr =nullptr ;*ptr
=5;}
// CHECK-NEXT:
diff --git a/clang/test/C/C23/n2350.c b/clang/test/C/C23/n2350.c
index 96b8c511d5716..af0ca6d79be5e 100644
--- a/clang/test/C/C23/n2350.c
+++ b/clang/test/C/C23/n2350.c
@@ -47,10 +47,11 @@ int struct_in_second_param(void) {
int macro(void) {
return offsetof(struct A // cpp-error {{'A' cannot be defined in a type
specifier}} \
- expected-warning {{defining a type within
'offsetof' is a C23 extension}}
+ expected-warning 2 {{defining a type within
'offsetof' is a C23 extension}}
{
int a;
-struct B // expected-warning {{defining a type within 'offsetof' is a C23
extension}}
+struct B // verifier seems to think the error is emitted by the macro
+ // In fact the location of the error is "B" on the line above
{
int c;
int d;
diff --git a/clang/test/ExtractAPI/macro_undefined.c
b/clang/test/ExtractAPI/macro_undefined.c
index 1d697db1e1613..7bb50af380c24 100644
--- a/clang/test/ExtractAPI/macro_undefined.c
+
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/boomanaiden154 updated https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)
easyonaadit wrote: Ping. https://github.com/llvm/llvm-project/pull/161815 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/168545 This patch fixes most of the ASan tests that were failing on Darwin when running under the internal shell. There are still a couple left that are more interesting cases that I'll do in a follow up patch. The tests that still need to be done: ``` TestCases/Darwin/duplicate_os_log_reports.cpp TestCases/Darwin/dyld_insert_libraries_reexec.cpp TestCases/Darwin/interface_symbols_darwin.cpp ``` ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
arsenm wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#168290** https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-view-in-graphite"; target="_blank">(View in Graphite) * **#168176** https://app.graphite.com/github/pr/llvm/llvm-project/168176?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn more about https://stacking.dev/?utm_source=stack-comment";>stacking. https://github.com/llvm/llvm-project/pull/168290 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TableGen] Strip directories from filename prefixes. (PR #168352)
https://github.com/kosarev created
https://github.com/llvm/llvm-project/pull/168352
Fixes https://github.com/llvm/llvm-project/pull/167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
>From af92eaef4e2cc8502d02d104ca44543e169d768e Mon Sep 17 00:00:00 2001
From: Ivan Kosarev
Date: Mon, 17 Nov 2025 11:35:13 +
Subject: [PATCH] [TableGen] Strip directories from filename prefixes.
Fixes https://github.com/llvm/llvm-project/pull/167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
---
llvm/lib/TableGen/Main.cpp | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index c3869c3fb9a5a..165c957fc9977 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -167,8 +167,7 @@ int llvm::TableGenMain(const char *argv0,
// Write output to memory.
Timer.startBackendTimer("Backend overall");
- SmallString<128> FilenamePrefix(OutputFilename);
- sys::path::replace_extension(FilenamePrefix, "");
+ SmallString<128> FilenamePrefix(sys::path::stem(OutputFilename));
TableGenOutputFiles OutFiles;
unsigned status = 0;
// ApplyCallback will return true if it did not apply any callback. In that
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)
https://github.com/DanBlackwell requested changes to this pull request. https://github.com/llvm/llvm-project/pull/168545 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
clairechingching wrote: @yonghong-song I'd like to backport this change so that I can enable misalignment in the rust nightly compiler, thanks! https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)
https://github.com/petar-avramovic created
https://github.com/llvm/llvm-project/pull/168411
None
>From 529b6f23ee1acb393880a336c0fdc89c1792bf1b Mon Sep 17 00:00:00 2001
From: Petar Avramovic
Date: Mon, 17 Nov 2025 18:47:58 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and
G_FNEG
---
.../AMDGPU/AMDGPURegBankLegalizeHelper.cpp| 17 +-
.../AMDGPU/AMDGPURegBankLegalizeRules.cpp | 19 ++
llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll | 233 ++
llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll | 216
4 files changed, 483 insertions(+), 2 deletions(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..d719f3d40295d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -629,10 +629,23 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr
&MI) {
void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
Register Dst = MI.getOperand(0).getReg();
assert(MRI.getType(Dst) == V2S16);
- auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
- auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
unsigned Opc = MI.getOpcode();
auto Flags = MI.getFlags();
+
+ if (MI.getNumOperands() == 2) {
+auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
+auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+ }
+
+ assert(MI.getNumOperands() == 3);
+ auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+ auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const
GCNSubtarget &_ST,
.Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
.Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
+ // FNEG and FABS are either folded as source modifiers or can be selected as
+ // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+ // targets without SALU float we still select them as VGPR since there would
+ // be no real sgpr use.
+ addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+ .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+ .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+ .Div(S16, {{Vgpr16}, {Vgpr16}})
+ .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+ .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+ .Div(S32, {{Vgpr32}, {Vgpr32}})
+ .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+ .Div(S64, {{Vgpr64}, {Vgpr64}})
+ .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+ .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+ .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+ .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+ .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32}}});
+
addRulesForGOpcs({G_FPTOUI})
.Any({{UniS32, S32}, {{Sgpr32}, {Sgpr32}}}, hasSALUFloat)
.Any({{UniS32, S32}, {{UniInVgprS32}, {Vgpr32}}}, !hasSALUFloat);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
new file mode 100644
index 0..093cdf744e3b4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
@@ -0,0 +1,233 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1100 -o -
%s | FileCheck -check-prefixes=GCN,GFX11,GFX11-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck
-check-prefixes=GCN,GFX11,GFX11-GISEL %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1200 -o -
%s | FileCheck -check-prefixes=GCN,GFX12,GFX12-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck
-check-prefixes=GCN,GFX12,GFX12-GISEL %s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN: ; %bb.0:
+; GCN-NEXT:v_an
[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)
llvmbot wrote: @efriedma-quic What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/168371 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] b835c10 - Revert "DAG: Allow select ptr combine for non-0 address spaces (#167909)"
Author: ronlieb
Date: 2025-11-16T16:47:51-05:00
New Revision: b835c10c902a27d1423d8944534d828afbcb4f6c
URL:
https://github.com/llvm/llvm-project/commit/b835c10c902a27d1423d8944534d828afbcb4f6c
DIFF:
https://github.com/llvm/llvm-project/commit/b835c10c902a27d1423d8944534d828afbcb4f6c.diff
LOG: Revert "DAG: Allow select ptr combine for non-0 address spaces (#167909)"
This reverts commit e5f499f48f2d1fddc590982da7232d08a6f8c54c.
Added:
Modified:
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
llvm/test/CodeGen/AMDGPU/select-load-to-load-select-ptr-combine.ll
llvm/test/CodeGen/AMDGPU/select-vectors.ll
llvm/test/CodeGen/AMDGPU/select64.ll
llvm/test/CodeGen/NVPTX/bf16-instructions.ll
llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
llvm/test/CodeGen/NVPTX/bug22246.ll
llvm/test/CodeGen/NVPTX/fast-math.ll
llvm/test/CodeGen/NVPTX/i1-select.ll
llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
llvm/test/CodeGen/NVPTX/lower-byval-args.ll
Removed:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6fbac0f8c8cdf..c9513611e6dcb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -29033,9 +29033,9 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect,
SDValue LHS,
// over-conservative. It would be beneficial to be able to remember
// both potential memory locations. Since we are discarding
// src value info, don't do the transformation if the memory
-// locations are not in the same address space.
-LLD->getPointerInfo().getAddrSpace() !=
-RLD->getPointerInfo().getAddrSpace() ||
+// locations are not in the default address space.
+LLD->getPointerInfo().getAddrSpace() != 0 ||
+RLD->getPointerInfo().getAddrSpace() != 0 ||
// We can't produce a CMOV of a TargetFrameIndex since we won't
// generate the address generation required.
LLD->getBasePtr().getOpcode() == ISD::TargetFrameIndex ||
@@ -29117,9 +29117,6 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect,
SDValue LHS,
// but the new load must be the minimum (most restrictive) alignment of the
// inputs.
Align Alignment = std::min(LLD->getAlign(), RLD->getAlign());
-unsigned AddrSpace = LLD->getAddressSpace();
-assert(AddrSpace == RLD->getAddressSpace());
-
MachineMemOperand::Flags MMOFlags = LLD->getMemOperand()->getFlags();
if (!RLD->isInvariant())
MMOFlags &= ~MachineMemOperand::MOInvariant;
@@ -29128,16 +29125,15 @@ bool DAGCombiner::SimplifySelectOps(SDNode
*TheSelect, SDValue LHS,
if (LLD->getExtensionType() == ISD::NON_EXTLOAD) {
// FIXME: Discards pointer and AA info.
Load = DAG.getLoad(TheSelect->getValueType(0), SDLoc(TheSelect),
- LLD->getChain(), Addr, MachinePointerInfo(AddrSpace),
- Alignment, MMOFlags);
+ LLD->getChain(), Addr, MachinePointerInfo(),
Alignment,
+ MMOFlags);
} else {
// FIXME: Discards pointer and AA info.
Load = DAG.getExtLoad(
LLD->getExtensionType() == ISD::EXTLOAD ? RLD->getExtensionType()
: LLD->getExtensionType(),
SDLoc(TheSelect), TheSelect->getValueType(0), LLD->getChain(), Addr,
- MachinePointerInfo(AddrSpace), LLD->getMemoryVT(), Alignment,
- MMOFlags);
+ MachinePointerInfo(), LLD->getMemoryVT(), Alignment, MMOFlags);
}
// Users of the select now use the result of the load.
diff --git a/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
b/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
index 5aabad682ad30..d9ad9590d9762 100644
--- a/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
+++ b/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
@@ -7,31 +7,27 @@
define amdgpu_kernel void @select_ptr_crash_i64_flat(i32 %tmp, [8 x i32], ptr
%ptr0, [8 x i32], ptr %ptr1, [8 x i32], ptr addrspace(1) %ptr2) {
; GCN-LABEL: select_ptr_crash_i64_flat:
; GCN: ; %bb.0:
+; GCN-NEXT:s_load_dword s6, s[8:9], 0x0
+; GCN-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x28
+; GCN-NEXT:s_load_dwordx2 s[2:3], s[8:9], 0x50
+; GCN-NEXT:s_load_dwordx2 s[4:5], s[8:9], 0x78
; GCN-NEXT:s_add_i32 s12, s12, s17
; GCN-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
-; GCN-NEXT:s_load_dword s2, s[8:9], 0x0
-; GCN-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x78
-; GCN-NEXT:s_add_u32 s4, s8, 40
-; GCN-NEXT:s_addc_u32 s3, s9, 0
-; GCN-NEXT:s_add_u32 s5, s8, 0x50
-; GCN-NEXT:s_addc_u32 s6, s9, 0
; GCN-NEXT:s_waitcnt lgkmcnt(0)
-; GCN-NEXT:s_cmp_eq_u32 s2, 0
-; GCN-NEXT:s_cselect_b32 s3, s3, s6
-; GCN-NEXT:s_cselect_b32 s2, s4, s5
-; GCN-N
[llvm-branch-commits] [llvm] [TableGen] Strip directories from filename prefixes. (PR #168352)
https://github.com/kosarev closed https://github.com/llvm/llvm-project/pull/168352 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
llvmbot wrote:
@llvm/pr-subscribers-backend-x86
Author: Matt Arsenault (arsenm)
Changes
---
Patch is 76.41 KiB, truncated to 20.00 KiB below, full version:
https://github.com/llvm/llvm-project/pull/168290.diff
6 Files Affected:
- (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+12-12)
- (modified) llvm/test/CodeGen/AArch64/sve-extract-scalable-vector.ll (-7)
- (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
(+133-133)
- (modified) llvm/test/CodeGen/X86/half.ll (+64-69)
- (modified) llvm/test/CodeGen/X86/matrix-multiply.ll (+38-36)
- (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
(+216-218)
``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Widen the input and call convert on the widened input vector.
unsigned NumConcat =
WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
- SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+ SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
Ops[0] = InOp;
SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
- SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
// Use the original element count so we don't do more scalar opts than
// necessary.
unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
// Otherwise unroll into some nasty scalar code and rebuild the vector.
EVT EltVT = WidenVT.getVectorElementType();
std::array EltVTs = {{EltVT, MVT::Other}};
- SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+ SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
SmallVector OpChains;
// Use the original element count so we don't do more scalar opts than
// necessary.
@@ -5819,7 +5819,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
}
while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
return DAG.getBuildVector(WidenVT, DL, Ops);
}
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
// input and then widening it. To avoid this, we widen the input only
if
// it results in a legal type.
if (WidenSize % InSize == 0) {
- SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+ SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
Ops[0] = InOp;
NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
SmallVector Ops;
DAG.ExtractVectorElements(InOp, Ops);
Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
}
@@ -6088,7 +6088,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
if (WidenNumElts % NumInElts == 0) {
// Add undef vectors to widen to correct length.
unsigned NumConcat = WidenNumElts / NumInElts;
- SDValue UndefVal = DAG.getUNDEF(InVT);
+ SDValue UndefVal = DAG.getPOISON(InVT);
SmallVector Ops(NumConcat);
for (unsigned i=0; i < NumOperands; ++i)
Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
for (unsigned j = 0; j < NumInElts; ++j)
Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
}
- SDValue UndefVal = DAG.getUNDEF(EltVT);
+ SDValue UndefVal = DAG.getPOISON(EltVT);
for (; Idx < WidenNumElts; ++Idx)
Ops[Idx] = UndefVal;
return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
Parts.push_back(
DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
}
@@ -6229,7 +6229,7 @@ SDValue
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode
[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)
llvmbot wrote:
@llvm/pr-subscribers-compiler-rt-sanitizer
Author: Aiden Grossman (boomanaiden154)
Changes
This test was doing some feature checks within the test itself. This patch
rewrites the feature checks to be done in a fashion more idiomatic to lit,
as the internal shell does not support the features needed for the previous
feature checks.
---
Full diff: https://github.com/llvm/llvm-project/pull/168655.diff
2 Files Affected:
- (modified)
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp (+2-13)
- (modified) compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py (+25)
``diff
diff --git
a/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
b/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
index 145e162a21c0e..89ee7a178525a 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
@@ -14,23 +14,12 @@
// RUN: %run %t/a.out 2>&1 \
// RUN: | FileCheck %s
-// RUN: MACOS_MAJOR=$(sw_vers -productVersion | cut -d'.' -f1)
-// RUN: MACOS_MINOR=$(sw_vers -productVersion | cut -d'.' -f2)
-
-// RUN: IS_MACOS_10_11_OR_HIGHER=$([ $MACOS_MAJOR -eq 10 ] && [ $MACOS_MINOR
-lt 11 ]; echo $?)
-
// On OS X 10.10 and lower, if the dylib is not DYLD-inserted, ASan will
re-exec.
-// RUN: if [ $IS_MACOS_10_11_OR_HIGHER == 0 ]; then \
-// RUN: %env_asan_opts=verbosity=1 %run %t/a.out 2>&1 \
-// RUN: | FileCheck --check-prefix=CHECK-NOINSERT %s; \
-// RUN: fi
+// RUN: %if mac-os-10-11-or-higher %{ %env_asan_opts=verbosity=1 %run %t/a.out
2>&1 | FileCheck --check-prefix=CHECK-NOINSERT %s %}
// On OS X 10.11 and higher, we don't need to DYLD-insert anymore, and the
interceptors
// still installed correctly. Let's just check that things work and we don't
try to re-exec.
-// RUN: if [ $IS_MACOS_10_11_OR_HIGHER == 1 ]; then \
-// RUN: %env_asan_opts=verbosity=1 %run %t/a.out 2>&1 \
-// RUN: | FileCheck %s; \
-// RUN: fi
+// RUN: %if mac-os-10-10-or-lower %{ %env_asan_opts=verbosity=1 %run %t/a.out
2>&1 | FileCheck %s %}
#include
diff --git a/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
b/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
index af82d30cf4de9..b09c1f7cd3daa 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
+++ b/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
@@ -1,3 +1,6 @@
+import subprocess
+
+
def getRoot(config):
if not config.parent:
return config
@@ -8,3 +11,25 @@ def getRoot(config):
if root.target_os not in ["Darwin"]:
config.unsupported = True
+
+
+def get_product_version():
+try:
+version_process = subprocess.run(
+["sw_vers", "-productVersion"],
+check=True,
+stdout=subprocess.PIPE,
+stderr=subprocess.PIPE,
+)
+version_string = version_process.stdout.decode("utf-8").split("\n")[0]
+version_split = version_string.split(".")
+return (int(version_split[0]), int(version_split[1]))
+except:
+return (0, 0)
+
+
+macos_version_major, macos_version_minor = get_product_version()
+if macos_version_major > 10 and macos_version_minor > 11:
+config.available_features.add("mac-os-10-11-or-higher")
+else:
+config.available_features.add("mac-os-10-10-or-lower")
``
https://github.com/llvm/llvm-project/pull/168655
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make duplicate_os_log_reports.cpp work with the internal shell (PR #168656)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/168656 This test used a for loop to implement retries and also did some trickery with PIDs. For this test, just invoke bash for actually running the test given we need the PID, and move the for loop into a separate shell script file that we can then invoke from within the test. Normally it would make sense to rewrite such a script in Python, but given this test does not have portability concerns only running on Darwin, it is fine to use a shell script here given there is no other convenient alternative. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)
https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/168655 This test was doing some feature checks within the test itself. This patch rewrites the feature checks to be done in a fashion more idiomatic to lit, as the internal shell does not support the features needed for the previous feature checks. ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make duplicate_os_log_reports.cpp work with the internal shell (PR #168656)
llvmbot wrote:
@llvm/pr-subscribers-compiler-rt-sanitizer
Author: Aiden Grossman (boomanaiden154)
Changes
This test used a for loop to implement retries and also did some trickery with
PIDs.
For this test, just invoke bash for actually running the test given we need the
PID,
and move the for loop into a separate shell script file that we can then invoke
from
within the test. Normally it would make sense to rewrite such a script in
Python, but
given this test does not have portability concerns only running on Darwin, it
is fine
to use a shell script here given there is no other convenient alternative.
---
Full diff: https://github.com/llvm/llvm-project/pull/168656.diff
2 Files Affected:
- (added) compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh (+6)
- (modified)
compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp (+3-7)
``diff
diff --git a/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh
b/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh
new file mode 100755
index 0..8939ca7ca1564
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+for I in {1..3}; do \
+ log show --debug --last $((SECONDS + 30))s --predicate "processID == $1"
--style syslog > $2; \
+ if grep -q "use-after-poison" $2; then break; fi; \
+ sleep 5; \
+done
diff --git
a/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
b/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
index 5a0353bfb1b31..6adca31745bfd 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
@@ -1,8 +1,8 @@
// UNSUPPORTED: ios
// REQUIRES: darwin_log_cmd
// RUN: %clangxx_asan -fsanitize-recover=address %s -o %t
-// RUN: { %env_asan_opts=halt_on_error=0,log_to_syslog=1 %run %t >
%t.process_output.txt 2>&1 & } \
-// RUN: ; export TEST_PID=$! ; wait ${TEST_PID}
+// RUN: bash -c "{ %env_asan_opts=halt_on_error=0,log_to_syslog=1 %run %t >
%t.process_output.txt 2>&1 & } \
+// RUN: ; export TEST_PID=$! ; wait ${TEST_PID}; echo -n ${TEST_PID} >
%t.test_pid"
// Check process output.
// RUN: FileCheck %s --check-prefixes CHECK,CHECK-PROC
-input-file=%t.process_output.txt
@@ -10,11 +10,7 @@
// Check syslog output. We filter recent system logs based on PID to avoid
// getting the logs of previous test runs. Make some reattempts in case there
// is a delay.
-// RUN: for I in {1..3}; do \
-// RUN: log show --debug --last $((SECONDS + 30))s --predicate "processID ==
${TEST_PID}" --style syslog > %t.process_syslog_output.txt; \
-// RUN: if grep -q "use-after-poison" %t.process_syslog_output.txt; then
break; fi; \
-// RUN: sleep 5; \
-// RUN: done
+// RUN: %S/Inputs/check-syslog.sh %{readfile:%t.test_pid}
%t.process_syslog_output.txt
// RUN: FileCheck %s -input-file=%t.process_syslog_output.txt
#include
#include
``
https://github.com/llvm/llvm-project/pull/168656
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)
https://github.com/ndrewh approved this pull request. https://github.com/llvm/llvm-project/pull/168655 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)
@@ -8,3 +11,25 @@ def getRoot(config):
if root.target_os not in ["Darwin"]:
config.unsupported = True
+
+
+def get_product_version():
+try:
+version_process = subprocess.run(
+["sw_vers", "-productVersion"],
+check=True,
+stdout=subprocess.PIPE,
+stderr=subprocess.PIPE,
+)
+version_string = version_process.stdout.decode("utf-8").split("\n")[0]
+version_split = version_string.split(".")
+return (int(version_split[0]), int(version_split[1]))
+except:
+return (0, 0)
+
+
+macos_version_major, macos_version_minor = get_product_version()
+if macos_version_major > 10 and macos_version_minor > 11:
+config.available_features.add("mac-os-10-11-or-higher")
ndrewh wrote:
I think we should only add this feature when `config.apple_platform == "osx"`
([ref](https://github.com/llvm/llvm-project/blob/afdc5093bb256180b3bec3ff827f21bf23d0f492/compiler-rt/test/lit.common.cfg.py#L411C39-L411C60)).
This particular test has `// UNSUPPORTED: ios` so it does not break anything
right now, but it's not ideal to have a feature set based on the host OS if we
are running on a simulator/device.
https://github.com/llvm/llvm-project/pull/168655
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
c-rhodes wrote: oops, apologies I didn't mean to trigger the bot, please ignore that. https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
c-rhodes wrote: @clairechingching backports are typically done via the `/cherry-pick ` command left as a comment on the original PR, it's documented here: https://llvm.org/docs/GitHub.html#backporting-fixes-to-the-release-branches although I would say it's unlikely this will get backported so late in the release cycle given it's a feature. The next release is 21.1.7 on Dec 2nd, at this point in the release cycle the criteria is critical bug fixes as documented here https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules. @tru is the release manager for 21.1.7, so ultimately it will be his decision. @tru wdyt? https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)
llvmbot wrote: >@clairechingching backports are typically done via the `/cherry-pick ` >command left as a comment on the original PR, it's documented here: >https://llvm.org/docs/GitHub.html#backporting-fixes-to-the-release-branches > >although I would say it's unlikely this will get backported so late in the >release cycle given it's a feature. The next release is 21.1.7 on Dec 2nd, at >this point in the release cycle the criteria is critical bug fixes as >documented here >https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules. @tru is the >release manager for 21.1.7, so ultimately it will be his decision. @tru wdyt? Error: Command failed due to missing milestone. https://github.com/llvm/llvm-project/pull/168314 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #165818)
https://github.com/RKSimon approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/165818 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)
https://github.com/RKSimon approved this pull request. LGTM - cheers https://github.com/llvm/llvm-project/pull/168290 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use assertion in VPExpressionRecipe creation (PR #165543)
https://github.com/SamTebbs33 closed https://github.com/llvm/llvm-project/pull/165543 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LV] Use assertion in VPExpressionRecipe creation (PR #165543)
SamTebbs33 wrote: Not needed as we'll be moving towards creating partial reductions during the VPExpressionRecipe creation process. https://github.com/llvm/llvm-project/pull/165543 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [TableGen][NFCI] Change TableGenMain() to take function_ref. (PR #167888)
https://github.com/kosarev updated
https://github.com/llvm/llvm-project/pull/167888
>From 12bf9cd3f96ccdcac6ced92d51a06d425375da42 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev
Date: Thu, 13 Nov 2025 12:10:51 +
Subject: [PATCH] [TableGen][NFCI] Change TableGenMain() to take function_ref.
It was switched from a function pointer to std::function in
TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6165ab6add5e57cd43a2e9fa1a9bc21d81
but there's no mention of any particular reason for that.
---
llvm/include/llvm/TableGen/Main.h | 14 ++
llvm/lib/TableGen/Main.cpp | 6 ++
llvm/utils/TableGen/Basic/TableGen.cpp | 2 +-
3 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/llvm/include/llvm/TableGen/Main.h
b/llvm/include/llvm/TableGen/Main.h
index bafce3a463acc..daede9f5a46f0 100644
--- a/llvm/include/llvm/TableGen/Main.h
+++ b/llvm/include/llvm/TableGen/Main.h
@@ -14,7 +14,6 @@
#define LLVM_TABLEGEN_MAIN_H
#include "llvm/Support/CommandLine.h"
-#include
#include
namespace llvm {
@@ -30,18 +29,17 @@ struct TableGenOutputFiles {
};
/// Returns true on error, false otherwise.
-using TableGenMainFn = bool(raw_ostream &OS, const RecordKeeper &Records);
+using TableGenMainFn =
+function_ref;
/// Perform the action using Records, and store output in OutFiles.
/// Returns true on error, false otherwise.
-using MultiFileTableGenMainFn = bool(TableGenOutputFiles &OutFiles,
- const RecordKeeper &Records);
+using MultiFileTableGenMainFn = function_ref;
-int TableGenMain(const char *argv0,
- std::function MainFn = nullptr);
+int TableGenMain(const char *argv0, TableGenMainFn MainFn = nullptr);
-int TableGenMain(const char *argv0,
- std::function MainFn = nullptr);
+int TableGenMain(const char *argv0, MultiFileTableGenMainFn MainFn = nullptr);
/// Controls emitting large character arrays as strings or character arrays.
/// Typically set to false when building with MSVC.
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index c3869c3fb9a5a..499723ab2acdc 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -127,8 +127,7 @@ static int WriteOutput(const TGParser &Parser, const char
*argv0,
return 0;
}
-int llvm::TableGenMain(const char *argv0,
- std::function MainFn) {
+int llvm::TableGenMain(const char *argv0, MultiFileTableGenMainFn MainFn) {
RecordKeeper Records;
TGTimer &Timer = Records.getTimer();
@@ -210,8 +209,7 @@ int llvm::TableGenMain(const char *argv0,
return 0;
}
-int llvm::TableGenMain(const char *argv0,
- std::function MainFn) {
+int llvm::TableGenMain(const char *argv0, TableGenMainFn MainFn) {
return TableGenMain(argv0, [&MainFn](TableGenOutputFiles &OutFiles,
const RecordKeeper &Records) {
std::string S;
diff --git a/llvm/utils/TableGen/Basic/TableGen.cpp
b/llvm/utils/TableGen/Basic/TableGen.cpp
index b79ae93dab4f7..a655cbbc16096 100644
--- a/llvm/utils/TableGen/Basic/TableGen.cpp
+++ b/llvm/utils/TableGen/Basic/TableGen.cpp
@@ -73,7 +73,7 @@ int tblgen_main(int argc, char **argv) {
InitLLVM X(argc, argv);
cl::ParseCommandLineOptions(argc, argv);
- std::function MainFn = nullptr;
+ MultiFileTableGenMainFn MainFn = nullptr;
return TableGenMain(argv[0], MainFn);
}
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)
https://github.com/mtrofin converted_to_draft https://github.com/llvm/llvm-project/pull/167399 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] RuntimeLibcalls: Add memset_pattern* calls to darwin systems (PR #167083)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/167083 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT][NFC] Rename Pointer Auth DWARF rewriter passes (PR #164622)
bgergely0 wrote: One more thing I'd like to sneak in here: adding the --print- flags for these passes. We discussed this before, but I didn't add them to the original patch (#120064). https://github.com/llvm/llvm-project/pull/164622 ___ llvm-branch-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)
@@ -545,8 +743,68 @@ GOFFObjectWriter::GOFFObjectWriter(
GOFFObjectWriter::~GOFFObjectWriter() = default;
+void GOFFObjectWriter::recordRelocation(const MCFragment &F,
+const MCFixup &Fixup, MCValue Target,
+uint64_t &FixedValue) {
+ const MCFixupKindInfo &FKI =
+ Asm->getBackend().getFixupKindInfo(Fixup.getKind());
+ const uint32_t Length = FKI.TargetSize / 8;
+ assert(FKI.TargetSize % 8 == 0 && "Target Size not multiple of 8");
+ const uint64_t FixupOffset = Asm->getFragmentOffset(F) + Fixup.getOffset();
+ bool IsPCRel = Fixup.isPCRel();
+
+ unsigned RelocType = TargetObjectWriter->getRelocType(Target, Fixup,
IsPCRel);
+
+ const MCSectionGOFF *PSection = static_cast(F.getParent());
+ const auto &A = *static_cast(Target.getAddSym());
+ const MCSymbolGOFF *B = static_cast(Target.getSubSym());
+ if (RelocType == MCGOFFObjectTargetWriter::Reloc_Type_RelImm) {
+if (A.isUndefined()) {
+ Asm->reportError(
+ Fixup.getLoc(),
+ Twine("symbol ")
+ .concat(A.getName())
+ .concat(" must be defined for a relative immediate relocation"));
+ return;
+}
+if (&A.getSection() != PSection) {
+ Asm->reportError(Fixup.getLoc(),
+ Twine("relative immediate relocation section mismatch:
")
+ .concat(A.getSection().getName())
+ .concat(" of symbol ")
+ .concat(A.getName())
+ .concat(" <-> ")
+ .concat(PSection->getName()));
+ return;
+}
+if (B) {
+ Asm->reportError(
+ Fixup.getLoc(),
+ Twine("subtractive symbol ")
+ .concat(B->getName())
+ .concat(" not supported for a relative immediate relocation"));
+ return;
+}
+FixedValue = Asm->getSymbolOffset(A) - FixupOffset + Target.getConstant();
+return;
+ }
+ FixedValue = Target.getConstant();
+
+ // The symbol only has a section-relative offset if it is a temporary symbol.
+ FixedValue += A.isTemporary() ? Asm->getSymbolOffset(A) : 0;
+ A.setUsedInReloc();
+ if (B) {
+FixedValue -= B->isTemporary() ? Asm->getSymbolOffset(*B) : 0;
+B->setUsedInReloc();
+ }
+
+ // Save relocation data for later writing.
+ SavedRelocs.emplace_back(PSection, &A, B, RelocType, FixupOffset, Length,
+ FixedValue);
+}
+
uint64_t GOFFObjectWriter::writeObject() {
- uint64_t Size = GOFFWriter(OS, *Asm).writeObject();
+ uint64_t Size = GOFFWriter(OS, *Asm, SavedRelocs).writeObject();
redstar wrote:
That is an interesting style question. The ELF solution also requires to have
all fields public, because it is difficult to make a class in an anonymous
namespace a `friend` of a class in a header file. I prefer passing the required
fields instead of making the fields public.
Looking forward, I do not see that I need to pass more fields from the
`GOFFObjectWriter` into the `GOFFWriter`, so the current solution looks
reasonable to me.
https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
