[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-19 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH created 
https://github.com/llvm/llvm-project/pull/168832

None

>From 49e4825b231eae88f7aea3184e1c8ca904abb674 Mon Sep 17 00:00:00 2001
From: vikhegde 
Date: Tue, 18 Nov 2025 11:13:37 +0530
Subject: [PATCH] [NPM] Schedule PhysicalRegisterUsageAnalysis before
 RegUsageInfoCollectorPass

---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 +++-
 llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll  | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
 
   derived().addPreEmitPass(addPass);
 
-  if (TM.Options.EnableIPRA)
+  if (TM.Options.EnableIPRA) {
 // Collect register usage information and produce a register mask of
 // clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
 addPass(RegUsageInfoCollectorPass());
+  }
 
   addPass(FuncletLayoutPass());
 
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
 
-; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O2: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,am

[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Vikram Hegde (vikramRH)


Changes

RegUsageInfoCollectorPass requires PhysicalRegisterUsageAnalysis to be valid. 
this is required since its a module analysis.

---
Full diff: https://github.com/llvm/llvm-project/pull/168832.diff


2 Files Affected:

- (modified) llvm/include/llvm/Passes/CodeGenPassBuilder.h (+3-1) 
- (modified) llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll (+3-3) 


``diff
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 03777c7fcb45f..0e14f2e50ae04 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -1081,10 +1081,12 @@ Error CodeGenPassBuilder::addMachinePasses(
 
   derived().addPreEmitPass(addPass);
 
-  if (TM.Options.EnableIPRA)
+  if (TM.Options.EnableIPRA) {
 // Collect register usage information and produce a register mask of
 // clobbered registers, to be used to optimize call sites.
+addPass(RequireAnalysisPass());
 addPass(RegUsageInfoCollectorPass());
+  }
 
   addPass(FuncletLayoutPass());
 
diff --git a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll 
b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
index ba29a5c2a9a9d..667f8aef58459 100644
--- a/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
+++ b/llvm/test/CodeGen/AMDGPU/llc-pipeline-npm.ll
@@ -9,11 +9,11 @@
 ; RUN:   | FileCheck -check-prefix=GCN-O3 %s
 
 
-; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation,reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
+; GCN-O0: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor,function(amdgpu-uniform-intrinsic-combine),expand-variadics,amdgpu-always-inline,always-inline,amdgpu-export-kernel-runtime-handles,amdgpu-lower-exec-sync,amdgpu-sw-lower-lds,amdgpu-lower-module-lds,function(atomic-expand,verify,gc-lowering,lower-constant-intrinsics,unreachableblockelim,ee-instrument,scalarize-masked-mem-intrin,expand-reductions,amdgpu-lower-kernel-arguments),amdgpu-lower-buffer-fat-pointers,amdgpu-lower-intrinsics,cgscc(function(lower-switch,lower-invoke,unreachableblockelim,amdgpu-unify-divergent-exit-nodes,fix-irreducible,unify-loop-exits,StructurizeCFGPass,amdgpu-annotate-uniform,si-annotate-control-flow,amdgpu-rewrite-undef-for-phi,lcssa,require,callbr-prepare,safe-stack,stack-protector,verify)),cgscc(function(machine-function(amdgpu-isel,si-fix-sgpr-copies,si-i1-copies,finalize-isel,localstackalloc))),require,cgscc(function(machine-function(reg-usage-propagation,phi-node-elimination,two-address-instruction,regallocfast,si-fix-vgpr-copies,remove-redundant-debug-values,fixup-statepoint-caller-saved,prolog-epilog,post-ra-pseudos,si-post-ra-bundler,fentry-insert,xray-instrumentation,patchable-function,si-memory-legalizer,si-insert-waitcnts,si-mode-register,si-late-branch-lowering,post-RA-hazard-rec,amdgpu-wait-sgpr-hazards,amdgpu-lower-vgpr-encoding,branch-relaxation))),require,cgscc(function(machine-function(reg-usage-collector,remove-loads-into-fake-uses,live-debug-values,machine-sanmd,stack-frame-layout,verify),free-machine-function))
 
-; GCN-O2: 
require,require,require,require,pre-isel-intrinsic-lowering,function(expand-large-div-rem,expand-fp),amdgpu-remove-incompatible-functions,amdgpu-printf-runtime-binding,amdgpu-lower-ctor-dtor

[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-19 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH ready_for_review 
https://github.com/llvm/llvm-project/pull/168832
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NPM] Schedule PhysicalRegisterUsageAnalysis before RegUsageInfoCollectorPass (PR #168832)

2025-11-19 Thread Vikram Hegde via llvm-branch-commits

https://github.com/vikramRH edited 
https://github.com/llvm/llvm-project/pull/168832
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();
+  auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+  if (Probes.empty())
+return nullptr;
+  const MCDecodedPseudoProbe &Probe = *Probes.begin();
+  const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+  while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+  return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+  using namespace yaml::bolt;
+  BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+  const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+  using ChildMapTy =
+  std::unordered_map;
+  using CallSiteInfoTy =
+  std::unordered_map;
+  // Mapping from a parent node id to a map InlineSite -> Child node.
+  DenseMap ParentToChildren;
+  // Collect calls in the profile: map from a parent node id to a map
+  // InlineSite -> CallSiteInfo ptr.
+  DenseMap ParentToCSI;
+  for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+  ProbeMatchingStats.TotalCallCount += CSI.Count;
+  ++ProbeMatchingStats.TotalCallSites;
+  if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+  if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count

maksfb wrote:

```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no callee for " << CSI.DestId << " " 
<< CSI.Count
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread via llvm-branch-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff origin/main HEAD --extensions cpp -- 
compiler-rt/test/asan/TestCases/Darwin/atos-symbolizer-dyld-root-path.cpp 
compiler-rt/test/asan/TestCases/Darwin/atos-symbolizer.cpp 
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp 
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_remove.cpp 
compiler-rt/test/asan/TestCases/Darwin/init_for_dlopen.cpp 
compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp 
compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp 
--diff_from_common_commit
``

:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:





View the diff from clang-format here.


``diff
diff --git a/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp 
b/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
index 09502e3aa..ac3c5898f 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/malloc_zone-protected.cpp
@@ -5,7 +5,6 @@
 // RUN: %clangxx_asan %s -o %t
 // RUN: env ASAN_OPTIONS="abort_on_error=1" not --crash %run %t 2>&1 | 
FileCheck %s
 
-
 void *pwn(malloc_zone_t *unused_zone, size_t unused_size) {
   printf("PWNED\n");
   return NULL;
diff --git 
a/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp 
b/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
index 66f7e06a3..7cb1dcc51 100644
--- a/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
+++ b/compiler-rt/test/asan_abi/TestCases/Darwin/llvm_interface_symbols.cpp
@@ -24,7 +24,7 @@
 // RUN: diff %t.imports-sorted %t.exports-sorted
 
 // Ensure that there is no dynamic dylib linked.
-// RUN: otool -L %t > %t.libs 
+// RUN: otool -L %t > %t.libs
 // not grep -q "dynamic.dylib" < %t.libs
 
 // UNSUPPORTED: ios

``




https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)

2025-11-19 Thread Kai Nacke via llvm-branch-commits


@@ -51,6 +51,7 @@ enum {
   // https://www.ibm.com/docs/en/hla-and-tf/1.6?topic=value-address-constants
   S_RCon, // Address of ADA of symbol.
   S_VCon, // Address of external function symbol.
+  S_QCon, // Class-based offset.

redstar wrote:

Yes, there is no user yet. It is requires for the ctor/dtor lists. I think 
about a test until then...

https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lld] release/21.x: [LLD][COFF] Align EC code ranges to page boundaries (#168222) (PR #168369)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lld-coff

Author: None (llvmbot)


Changes

Backport af45b0202cdd443beedb02392f653d8cff5bd931

Requested by: @cjacek

---
Full diff: https://github.com/llvm/llvm-project/pull/168369.diff


2 Files Affected:

- (modified) lld/COFF/Chunks.cpp (+1-1) 
- (modified) lld/test/COFF/arm64ec-codemap.test (+33-3) 


``diff
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index 01752cdc6a9da..cfb33daa024a7 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -939,7 +939,7 @@ void ECCodeMapChunk::writeTo(uint8_t *buf) const {
   auto table = reinterpret_cast(buf);
   for (uint32_t i = 0; i < map.size(); i++) {
 const ECCodeMapEntry &entry = map[i];
-uint32_t start = entry.first->getRVA();
+uint32_t start = entry.first->getRVA() & ~0xfff;
 table[i].StartOffset = start | entry.type;
 table[i].Length = entry.last->getRVA() + entry.last->getSize() - start;
   }
diff --git a/lld/test/COFF/arm64ec-codemap.test 
b/lld/test/COFF/arm64ec-codemap.test
index 050261117be2e..bbc682d19920f 100644
--- a/lld/test/COFF/arm64ec-codemap.test
+++ b/lld/test/COFF/arm64ec-codemap.test
@@ -7,6 +7,7 @@ RUN: llvm-mc -filetype=obj -triple=arm64ec-windows 
arm64ec-func-sym2.s -o arm64e
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec.s -o data-sec.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec2.s -o data-sec2.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows empty-sec.s -o 
arm64ec-empty-sec.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows entry-thunk.s -o 
entry-thunk.obj
 RUN: llvm-mc -filetype=obj -triple=x86_64-windows x86_64-func-sym.s -o 
x86_64-func-sym.obj
 RUN: llvm-mc -filetype=obj -triple=x86_64-windows empty-sec.s -o 
x86_64-empty-sec.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows 
%S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
@@ -162,15 +163,17 @@ RUN:  loadconfig-arm64ec.obj -dll -noentry 
-merge:test=.testdata -merge:
 
 RUN: llvm-readobj --coff-load-config testcm.dll | FileCheck 
-check-prefix=CODEMAPCM %s
 CODEMAPCM:  CodeMap [
-CODEMAPCM-NEXT: 0x4008 - 0x4016  X64
+CODEMAPCM-NEXT: 0x4000 - 0x4016  X64
 CODEMAPCM-NEXT: ]
 
 RUN: llvm-objdump -d testcm.dll | FileCheck -check-prefix=DISASMCM %s
 DISASMCM:  Disassembly of section .testdat:
 DISASMCM-EMPTY:
 DISASMCM-NEXT: 000180004000 <.testdat>:
-DISASMCM-NEXT: 180004000: 0001 udf #0x1
-DISASMCM-NEXT: 180004004:  udf #0x0
+DISASMCM-NEXT: 180004000: 01 00addl %eax, (%rax)
+DISASMCM-NEXT: 180004002: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004004: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004006: 00 00addb %al, (%rax)
 DISASMCM-NEXT: 180004008: b8 03 00 00 00   movl$0x3, %eax
 DISASMCM-NEXT: 18000400d: c3   retq
 DISASMCM-NEXT: 18000400e: 00 00addb%al, (%rax)
@@ -207,6 +210,14 @@ DISASMMS-NEXT: 000180006000 :
 DISASMMS-NEXT: 180006000: 528000a0 mov w0, #0x5// =5
 DISASMMS-NEXT: 180006004: d65f03c0 ret
 
+Test the code map that includes an ARM64EC function padded by its entry-thunk 
offset.
+
+RUN: lld-link -out:testpad.dll -machine:arm64ec entry-thunk.obj 
loadconfig-arm64ec.obj -dll -noentry -include:func
+RUN: llvm-readobj --coff-load-config testpad.dll | FileCheck 
-check-prefix=CODEMAPPAD %s
+CODEMAPPAD:  CodeMap [
+CODEMAPPAD:0x1000 - 0x1010  ARM64EC
+CODEMAPPAD-NEXT: ]
+
 
 #--- arm64-func-sym.s
 .text
@@ -266,3 +277,22 @@ x86_64_func_sym2:
 .section .empty1, "xr"
 .section .empty2, "xr"
 .section .empty3, "xr"
+
+#--- entry-thunk.s
+.section .text,"xr",discard,func
+.globl func
+.p2align 2, 0x0
+func:
+mov w0, #1
+ret
+
+.section .wowthk$aa,"xr",discard,thunk
+.globl thunk
+.p2align 2
+thunk:
+ret
+
+.section .hybmp$x,"yi"
+.symidx func
+.symidx thunk
+.word 1  // entry thunk

``




https://github.com/llvm/llvm-project/pull/168369
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Paschalis Mpeis via llvm-branch-commits
Gergely =?utf-8?q?Bálint?= ,
Gergely =?utf-8?q?Bálint?= ,Gergely Balint
 
Message-ID:
In-Reply-To: 


https://github.com/paschalis-mpeis commented:

Hey Gergely,

Thanks for the updates!
Just a reminder to prepend tests with pauth-, so they appear groupped, and 
`PAuthPacRetDesign.md` too, so the `BTI` doc would appear close to it in the 
future.

Can you also please expand both `CFI` acronyms in a note at top of the doc, for 
completeness and clarity?

https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-arm

Author: None (llvmbot)


Changes

Backport 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868

Requested by: @davemgreen

---
Full diff: https://github.com/llvm/llvm-project/pull/168380.diff


4 Files Affected:

- (modified) llvm/lib/Target/ARM/ARMAsmPrinter.cpp (+11-10) 
- (modified) llvm/lib/Target/ARM/ARMSubtarget.cpp (+1-11) 
- (modified) llvm/lib/Target/ARM/ARMTargetMachine.h (+14) 
- (added) llvm/test/CodeGen/ARM/xxstructor-nodef.ll (+7) 


``diff
diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp 
b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 850b00406f09e..aa6ef55dad26c 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -97,7 +97,8 @@ void ARMAsmPrinter::emitXXStructor(const DataLayout &DL, 
const Constant *CV) {
 
   const MCExpr *E = MCSymbolRefExpr::create(
   GetARMGVSymbol(GV, ARMII::MO_NO_FLAG),
-  (Subtarget->isTargetELF() ? ARM::S_TARGET1 : ARM::S_None), OutContext);
+  (TM.getTargetTriple().isOSBinFormatELF() ? ARM::S_TARGET1 : ARM::S_None),
+  OutContext);
 
   OutStreamer->emitValue(E, Size);
 }
@@ -595,8 +596,7 @@ void ARMAsmPrinter::emitEndOfAsmFile(Module &M) {
   ARMTargetStreamer &ATS = static_cast(TS);
 
   if (OptimizationGoals > 0 &&
-  (Subtarget->isTargetAEABI() || Subtarget->isTargetGNUAEABI() ||
-   Subtarget->isTargetMuslAEABI()))
+  (TT.isTargetAEABI() || TT.isTargetGNUAEABI() || TT.isTargetMuslAEABI()))
 ATS.emitAttribute(ARMBuildAttrs::ABI_optimization_goals, 
OptimizationGoals);
   OptimizationGoals = -1;
 
@@ -866,9 +866,10 @@ static uint8_t getModifierSpecifier(ARMCP::ARMCPModifier 
Modifier) {
 
 MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue *GV,
 unsigned char TargetFlags) {
-  if (Subtarget->isTargetMachO()) {
+  const Triple &TT = TM.getTargetTriple();
+  if (TT.isOSBinFormatMachO()) {
 bool IsIndirect =
-(TargetFlags & ARMII::MO_NONLAZY) && Subtarget->isGVIndirectSymbol(GV);
+(TargetFlags & ARMII::MO_NONLAZY) && getTM().isGVIndirectSymbol(GV);
 
 if (!IsIndirect)
   return getSymbol(GV);
@@ -885,9 +886,8 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue 
*GV,
   StubSym = MachineModuleInfoImpl::StubValueTy(getSymbol(GV),
!GV->hasInternalLinkage());
 return MCSym;
-  } else if (Subtarget->isTargetCOFF()) {
-assert(Subtarget->isTargetWindows() &&
-   "Windows is the only supported COFF target");
+  } else if (TT.isOSBinFormatCOFF()) {
+assert(TT.isOSWindows() && "Windows is the only supported COFF target");
 
 bool IsIndirect =
 (TargetFlags & (ARMII::MO_DLLIMPORT | ARMII::MO_COFFSTUB));
@@ -914,7 +914,7 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue 
*GV,
 }
 
 return MCSym;
-  } else if (Subtarget->isTargetELF()) {
+  } else if (TT.isOSBinFormatELF()) {
 return getSymbolPreferLocal(*GV);
   }
   llvm_unreachable("unexpected target");
@@ -960,7 +960,8 @@ void ARMAsmPrinter::emitMachineConstantPoolValue(
 
 // On Darwin, const-pool entries may get the "FOO$non_lazy_ptr" mangling, 
so
 // flag the global as MO_NONLAZY.
-unsigned char TF = Subtarget->isTargetMachO() ? ARMII::MO_NONLAZY : 0;
+unsigned char TF =
+TM.getTargetTriple().isOSBinFormatMachO() ? ARMII::MO_NONLAZY : 0;
 MCSym = GetARMGVSymbol(GV, TF);
   } else if (ACPV->isMachineBasicBlock()) {
 const MachineBasicBlock *MBB = cast(ACPV)->getMBB();
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp 
b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 13185a7d797a3..63d6e2ea7389b 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -316,17 +316,7 @@ bool ARMSubtarget::isRWPI() const {
 }
 
 bool ARMSubtarget::isGVIndirectSymbol(const GlobalValue *GV) const {
-  if (!TM.shouldAssumeDSOLocal(GV))
-return true;
-
-  // 32 bit macho has no relocation for a-b if a is undefined, even if b is in
-  // the section that is being relocated. This means we have to use o load even
-  // for GVs that are known to be local to the dso.
-  if (isTargetMachO() && TM.isPositionIndependent() &&
-  (GV->isDeclarationForLinker() || GV->hasCommonLinkage()))
-return true;
-
-  return false;
+  return TM.isGVIndirectSymbol(GV);
 }
 
 bool ARMSubtarget::isGVInGOT(const GlobalValue *GV) const {
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.h 
b/llvm/lib/Target/ARM/ARMTargetMachine.h
index 1d73af1da6d02..5f17a13dac40e 100644
--- a/llvm/lib/Target/ARM/ARMTargetMachine.h
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.h
@@ -99,6 +99,20 @@ class ARMBaseTargetMachine : public CodeGenTargetMachineImpl 
{
 return true;
   }
 
+  bool isGVIndirectSymbol(const GlobalValue *GV) const {
+if (!shouldAssumeDSOLocal(GV))
+  return true;
+
+// 32 bit macho has no relocation for a-b if a is undefined, even if b is 
in
+// the s

[llvm-branch-commits] [openmp] 1d80cda - Revert "[OpenMP] Implement omp_get_uid_from_device() / omp_get_device_from_ui…"

2025-11-19 Thread via llvm-branch-commits

Author: Robert Imschweiler
Date: 2025-11-18T16:03:12+01:00
New Revision: 1d80cda87609b6dcb8a84d60df41bc26b535fdf7

URL: 
https://github.com/llvm/llvm-project/commit/1d80cda87609b6dcb8a84d60df41bc26b535fdf7
DIFF: 
https://github.com/llvm/llvm-project/commit/1d80cda87609b6dcb8a84d60df41bc26b535fdf7.diff

LOG: Revert "[OpenMP] Implement omp_get_uid_from_device() / 
omp_get_device_from_ui…"

This reverts commit 65c4a534bd55ed56962fb99c36f464b3f1c9732f.

Added: 


Modified: 
offload/include/OpenMP/omp.h
offload/include/omptarget.h
offload/libomptarget/OpenMP/API.cpp
offload/libomptarget/exports
openmp/device/include/DeviceTypes.h
openmp/device/include/Interface.h
openmp/device/src/State.cpp
openmp/runtime/src/dllexports
openmp/runtime/src/include/omp.h.var
openmp/runtime/src/include/omp_lib.F90.var
openmp/runtime/src/include/omp_lib.h.var
openmp/runtime/src/kmp_ftn_entry.h
openmp/runtime/src/kmp_ftn_os.h

Removed: 
offload/test/api/omp_device_uid.c
openmp/runtime/test/api/omp_device_uid.c



diff  --git a/offload/include/OpenMP/omp.h b/offload/include/OpenMP/omp.h
index d92c7e450c677..768ca46a9bed0 100644
--- a/offload/include/OpenMP/omp.h
+++ b/offload/include/OpenMP/omp.h
@@ -30,13 +30,6 @@
 
 extern "C" {
 
-/// Definitions
-///{
-
-#define omp_invalid_device -2
-
-///}
-
 /// Type declarations
 ///{
 

diff  --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 00910704a979a..fbb4a06accf84 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -270,8 +270,6 @@ extern "C" {
 void ompx_dump_mapping_tables(void);
 int omp_get_num_devices(void);
 int omp_get_device_num(void);
-int omp_get_device_from_uid(const char *DeviceUid);
-const char *omp_get_uid_from_device(int DeviceNum);
 int omp_get_initial_device(void);
 void *omp_target_alloc(size_t Size, int DeviceNum);
 void omp_target_free(void *DevicePtr, int DeviceNum);

diff  --git a/offload/libomptarget/OpenMP/API.cpp 
b/offload/libomptarget/OpenMP/API.cpp
index 6e85e5764449c..dd83a3ccd08e6 100644
--- a/offload/libomptarget/OpenMP/API.cpp
+++ b/offload/libomptarget/OpenMP/API.cpp
@@ -40,8 +40,6 @@ EXTERN void ompx_dump_mapping_tables() {
 using namespace llvm::omp::target::ompt;
 #endif
 
-using GenericDeviceTy = llvm::omp::target::plugin::GenericDeviceTy;
-
 void *targetAllocExplicit(size_t Size, int DeviceNum, int Kind,
   const char *Name);
 void targetFreeExplicit(void *DevicePtr, int DeviceNum, int Kind,
@@ -70,62 +68,6 @@ EXTERN int omp_get_device_num(void) {
   return HostDevice;
 }
 
-static inline bool is_initial_device_uid(const char *DeviceUid) {
-  return strcmp(DeviceUid, GenericPluginTy::getHostDeviceUid()) == 0;
-}
-
-EXTERN int omp_get_device_from_uid(const char *DeviceUid) {
-  TIMESCOPE();
-  OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
-
-  if (!DeviceUid) {
-DP("Call to omp_get_device_from_uid returning omp_invalid_device\n");
-return omp_invalid_device;
-  }
-  if (is_initial_device_uid(DeviceUid)) {
-DP("Call to omp_get_device_from_uid returning initial device number %d\n",
-   omp_get_initial_device());
-return omp_get_initial_device();
-  }
-
-  int DeviceNum = omp_invalid_device;
-
-  auto ExclusiveDevicesAccessor = PM->getExclusiveDevicesAccessor();
-  for (const DeviceTy &Device : PM->devices(ExclusiveDevicesAccessor)) {
-const char *Uid = Device.RTL->getDevice(Device.RTLDeviceID).getDeviceUid();
-if (Uid && strcmp(DeviceUid, Uid) == 0) {
-  DeviceNum = Device.DeviceID;
-  break;
-}
-  }
-
-  DP("Call to omp_get_device_from_uid returning %d\n", DeviceNum);
-  return DeviceNum;
-}
-
-EXTERN const char *omp_get_uid_from_device(int DeviceNum) {
-  TIMESCOPE();
-  OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
-
-  if (DeviceNum == omp_invalid_device) {
-DP("Call to omp_get_uid_from_device returning nullptr\n");
-return nullptr;
-  }
-  if (DeviceNum == omp_get_initial_device()) {
-DP("Call to omp_get_uid_from_device returning initial device UID\n");
-return GenericPluginTy::getHostDeviceUid();
-  }
-
-  auto DeviceOrErr = PM->getDevice(DeviceNum);
-  if (!DeviceOrErr)
-FATAL_MESSAGE(DeviceNum, "%s", toString(DeviceOrErr.takeError()).c_str());
-
-  const char *Uid =
-  DeviceOrErr->RTL->getDevice(DeviceOrErr->RTLDeviceID).getDeviceUid();
-  DP("Call to omp_get_uid_from_device returning %s\n", Uid);
-  return Uid;
-}
-
 EXTERN int omp_get_initial_device(void) {
   TIMESCOPE();
   OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));

diff  --git a/offload/libomptarget/exports b/offload/libomptarget/exports
index 2ebc23e3cf60a..910a5b6c827a7 100644
--- a/offload/libomptarget/exports
+++ b/offload/libomptarget/exports
@@ -40,8 +40,6 @@ VERS1.0 {
 omp_get_mapped_ptr;
 omp_get_num_devices;
 o

[llvm-branch-commits] [llvm] [AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (PR #166361)

2025-11-19 Thread Benjamin Maxwell via llvm-branch-commits


@@ -356,20 +356,13 @@ define void @new_za_zt0_caller(ptr %callee) 
"aarch64_new_za" "aarch64_new_zt0" n
 
 ; Expect clear ZA on entry
 define void @new_za_shared_zt0_caller(ptr %callee) "aarch64_new_za" 
"aarch64_in_zt0" nounwind {
-; CHECK-LABEL: new_za_shared_zt0_caller:
-; CHECK:   // %bb.0:
-; CHECK-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:zero {za}
-; CHECK-NEXT:blr x0
-; CHECK-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:ret
-;
-; CHECK-NEWLOWERING-LABEL: new_za_shared_zt0_caller:
-; CHECK-NEWLOWERING:   // %bb.0:
-; CHECK-NEWLOWERING-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEWLOWERING-NEXT:blr x0

MacDue wrote:

This is only relevant for functions with ZT0 (where it's possible to have a new 
ZA with a shared ZA interface due to an in/out ZT0), so I hadn't considered it 
before.

https://github.com/llvm/llvm-project/pull/166361
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Dan Blackwell via llvm-branch-commits


@@ -5,7 +5,7 @@
 // - By default the lit config sets this but we don't want this
 //   test to implicitly depend on this.
 // - It avoids requiring `--crash` to be passed to `not`.
-// RUN: APPLE_ASAN_INIT_FOR_DLOPEN=0 %env_asan_opts=abort_on_error=0 not \
+// RUN: env APPLE_ASAN_INIT_FOR_DLOPEN=0 %env_asan_opts=abort_on_error=0 not \

DanBlackwell wrote:

```suggestion
// RUN: %env_asan_opts=abort_on_error=0 APPLE_ASAN_INIT_FOR_DLOPEN=0 not \
```
`%env_asan_opts` expands to `env ASAN_OPTIONS=`

https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][PAC] Warn about synchronous unwind tables (PR #165227)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

https://github.com/bgergely0 updated 
https://github.com/llvm/llvm-project/pull/165227

From 61e03b5abf74bd5a61f2aa4d21219c67cfbfce24 Mon Sep 17 00:00:00 2001
From: Gergely Balint 
Date: Mon, 27 Oct 2025 09:29:54 +
Subject: [PATCH 1/4] [BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
---
 bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp|  9 -
 .../AArch64/pacret-synchronous-unwind.cpp | 33 +++
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp

diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp 
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 91030544d2b88..01af88818a21d 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -133,11 +133,18 @@ Error 
PointerAuthCFIAnalyzer::runOnFunctions(BinaryContext &BC) {
   ParallelUtilities::runOnEachFunction(
   BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
   SkipPredicate, "PointerAuthCFIAnalyzer");
+
+  float IgnoredPercent = (100.0 * FunctionsIgnored) / Total;
   BC.outs() << "BOLT-INFO: PointerAuthCFIAnalyzer ran on " << Total
 << " functions. Ignored " << FunctionsIgnored << " functions "
-<< format("(%.2lf%%)", (100.0 * FunctionsIgnored) / Total)
+<< format("(%.2lf%%)", IgnoredPercent)
 << " because of CFI inconsistencies\n";
 
+  if (IgnoredPercent >= 10.0)
+BC.outs() << "BOLT-WARNING: PointerAuthCFIAnalyzer only supports "
+ "asynchronous unwind tables. For C compilers, see "
+ "-fasynchronous-unwind-tables.\n";
+
   return Error::success();
 }
 
diff --git a/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp 
b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
new file mode 100644
index 0..1bfeeaed3715a
--- /dev/null
+++ b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
@@ -0,0 +1,33 @@
+// Test to demonstrate that functions compiled with synchronous unwind tables
+// are ignored by the PointerAuthCFIAnalyzer.
+// Exception handling is needed to have _any_ unwind tables, otherwise the
+// PointerAuthCFIAnalyzer does not run on these functions, so it does not 
ignore
+// any function.
+//
+// REQUIRES: system-linux,bolt-runtime
+//
+// RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+// RUN: -mbranch-protection=pac-ret \
+// RUN: -fno-asynchronous-unwind-tables \
+// RUN: %s -o %t.exe -Wl,-q
+// RUN: llvm-bolt %t.exe -o %t.bolt | FileCheck %s --check-prefix=CHECK
+//
+// CHECK: PointerAuthCFIAnalyzer ran on 3 functions. Ignored
+// CHECK-NOT: 0 functions (0.00%) because of CFI inconsistencies
+// CHECK-SAME: 1 functions (33.33%) because of CFI inconsistencies
+// CHECK-NEXT: BOLT-WARNING: PointerAuthCFIAnalyzer only supports asynchronous
+// CHECK-SAME: unwind tables. For C compilers, see 
-fasynchronous-unwind-tables.
+
+#include 
+#include 
+
+void foo() { throw std::runtime_error("Exception from foo()."); }
+
+int main() {
+  try {
+foo();
+  } catch (const std::exception &e) {
+printf("Exception caught: %s\n", e.what());
+  }
+  return 0;
+}

From 7fc8acdbf4cef2aa7f4f5ca9d136d4cb1bce9fe6 Mon Sep 17 00:00:00 2001
From: Gergely Balint 
Date: Tue, 28 Oct 2025 09:23:08 +
Subject: [PATCH 2/4] [BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer

---
 bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp | 27 ++
 bolt/test/AArch64/pacret-cfi-incorrect.s   |  2 +-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp 
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 01af88818a21d..5979d5fb01818 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -28,6 +28,10 @@
 
 using namespace llvm;
 
+namespace opts {
+extern llvm::cl::opt Verbosity;
+} // namespace opts
+
 namespace llvm {
 namespace bolt {
 
@@ -43,9 +47,10 @@ bool PointerAuthCFIAnalyzer::runOnFunction(BinaryFunction 
&BF) {
 // Not all functions have .cfi_negate_ra_state in them. But if one 
does,
 // we expect psign/pauth instructions to have the hasNegateRAState
 // annotation.
-BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
-  << BF.getPrintName()
-  << ": ptr sign/auth inst without .cfi_negate_ra_state\n";
+if (opts::Verbosity >= 1)
+  BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+<< BF.getPrintName()
+<< ": ptr sign/auth inst without .cfi_negate_ra_state\n";
 std::lock_guard Lock(IgnoreMutex);
 BF.setIgnored();
 return false;
@@ -65,9 +70,10 @@ bool PointerAuthCFIAnalyzer::runOnF

[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/168545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)

2025-11-19 Thread Kai Nacke via llvm-branch-commits


@@ -16,12 +17,35 @@ namespace {
 class SystemZGOFFObjectWriter : public MCGOFFObjectTargetWriter {
 public:
   SystemZGOFFObjectWriter();
+
+  unsigned getRelocType(const MCValue &Target, const MCFixup &Fixup,
+bool IsPCRel) const override;
 };
 } // end anonymous namespace
 
 SystemZGOFFObjectWriter::SystemZGOFFObjectWriter()
 : MCGOFFObjectTargetWriter() {}
 
+unsigned SystemZGOFFObjectWriter::getRelocType(const MCValue &Target,
+   const MCFixup &Fixup,

redstar wrote:

Decided to get the information form `Fixup`.

https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -682,37 +561,22 @@ matchBlocks(BinaryContext &BC, const 
yaml::bolt::BinaryFunctionProfile &YamlBF,
 << "\n");
   continue;
 }
-addMatchedBlock({MatchedBlock, Method}, YamlBF, YamlBB);
-  }
-
-  for (const auto &[YamlBBIdx, FlowBlockProfile] : MatchedBlocks) {
-const auto &[MatchedBlock, YamlBB] = FlowBlockProfile;
-StaleMatcher::MatchMethod Method = MatchedFlowBlocks.lookup(MatchedBlock);
+MatchedBlocks[YamlBB.Index] = {MatchedBlock, 1};
 BlendedBlockHash BinHash = BlendedHashes[MatchedBlock->Index - 1];
-LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBBIdx << ")"
-  << " with hash " << Twine::utohexstr(YamlBB->Hash)
+LLVM_DEBUG(dbgs() << "Matched yaml block (bid = " << YamlBB.Index << ")"

maksfb wrote:

```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: matched yaml block (bid = " << 
YamlBB.Index << ")"
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/168545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)

2025-11-19 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/167399

>From f5a571197ba3ec726353ca4f0550d381a6a8dcba Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 10 Nov 2025 12:33:12 -0800
Subject: [PATCH] [LTT] Mark as unkown weak function tests.

---
 llvm/lib/Transforms/IPO/LowerTypeTests.cpp   | 3 +++
 llvm/test/Transforms/LowerTypeTests/function-weak.ll | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp 
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 31b5487ce6ec6..7b046978802db 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -1493,6 +1493,9 @@ void 
LowerTypeTestsModule::replaceWeakDeclarationWithJumpTablePtr(
  Constant::getNullValue(F->getType()));
 Value *Select = Builder.CreateSelect(ICmp, JT,
  Constant::getNullValue(F->getType()));
+
+if (auto *SI = dyn_cast(Select))
+  setExplicitlyUnknownBranchWeightsIfProfiled(*SI, DEBUG_TYPE);
 // For phi nodes, we need to update the incoming value for all operands
 // with the same predecessor.
 if (PN)
diff --git a/llvm/test/Transforms/LowerTypeTests/function-weak.ll 
b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
index 4ea03b6c2c1fa..dbbe8fa4a0a9a 100644
--- a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
@@ -32,10 +32,10 @@ target triple = "x86_64-unknown-linux-gnu"
 declare !type !0 extern_weak void @f()
 
 ; CHECK: define zeroext i1 @check_f()
-define zeroext i1 @check_f() {
+define zeroext i1 @check_f() !prof !{!"function_entry_count", i32 10} {
 entry:
 ; CHECK: [[CMP:%.*]] = icmp ne ptr @f, null
-; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null
+; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null, !prof 
![[SELPROF:[0-9]+]]
 ; CHECK: [[PTI:%.*]] = ptrtoint ptr [[SEL]] to i1
 ; CHECK: ret i1 [[PTI]]
   ret i1 ptrtoint (ptr @f to i1)
@@ -165,3 +165,4 @@ define i1 @foo(ptr %p) {
 ; CHECK-NEXT: }
 
 !0 = !{i32 0, !"typeid1"}
+; CHECK: ![[SELPROF]] = !{!"unknown", !"lowertypetests"}
\ No newline at end of file

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -242,6 +253,18 @@ class YAMLProfileReader : public ProfileReaderBase {
 ProfiledFunctions.emplace(&BF);
   }
 
+  /// Return a top-level binary inline tree node for a given \p BF

maksfb wrote:

```suggestion
  /// Return a top-level binary inline tree node for a given \p BF.
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)

2025-11-19 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/167399

>From f5a571197ba3ec726353ca4f0550d381a6a8dcba Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Mon, 10 Nov 2025 12:33:12 -0800
Subject: [PATCH] [LTT] Mark as unkown weak function tests.

---
 llvm/lib/Transforms/IPO/LowerTypeTests.cpp   | 3 +++
 llvm/test/Transforms/LowerTypeTests/function-weak.ll | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp 
b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index 31b5487ce6ec6..7b046978802db 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -1493,6 +1493,9 @@ void 
LowerTypeTestsModule::replaceWeakDeclarationWithJumpTablePtr(
  Constant::getNullValue(F->getType()));
 Value *Select = Builder.CreateSelect(ICmp, JT,
  Constant::getNullValue(F->getType()));
+
+if (auto *SI = dyn_cast(Select))
+  setExplicitlyUnknownBranchWeightsIfProfiled(*SI, DEBUG_TYPE);
 // For phi nodes, we need to update the incoming value for all operands
 // with the same predecessor.
 if (PN)
diff --git a/llvm/test/Transforms/LowerTypeTests/function-weak.ll 
b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
index 4ea03b6c2c1fa..dbbe8fa4a0a9a 100644
--- a/llvm/test/Transforms/LowerTypeTests/function-weak.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function-weak.ll
@@ -32,10 +32,10 @@ target triple = "x86_64-unknown-linux-gnu"
 declare !type !0 extern_weak void @f()
 
 ; CHECK: define zeroext i1 @check_f()
-define zeroext i1 @check_f() {
+define zeroext i1 @check_f() !prof !{!"function_entry_count", i32 10} {
 entry:
 ; CHECK: [[CMP:%.*]] = icmp ne ptr @f, null
-; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null
+; CHECK: [[SEL:%.*]] = select i1 [[CMP]], ptr @[[JT:.*]], ptr null, !prof 
![[SELPROF:[0-9]+]]
 ; CHECK: [[PTI:%.*]] = ptrtoint ptr [[SEL]] to i1
 ; CHECK: ret i1 [[PTI]]
   ret i1 ptrtoint (ptr @f to i1)
@@ -165,3 +165,4 @@ define i1 @foo(ptr %p) {
 ; CHECK-NEXT: }
 
 !0 = !{i32 0, !"typeid1"}
+; CHECK: ![[SELPROF]] = !{!"unknown", !"lowertypetests"}
\ No newline at end of file

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread via llvm-branch-commits

yonghong-song wrote:

I have no objection. But in the above, I see
```
Merging is blocked
Cannot update this protected ref.
```
Not sure what is the problem.

https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();

maksfb wrote:

```suggestion
  const uint64_t Addr = BF.getAddress();
  const uint64_t Size = BF.getSize();
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();
+  auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+  if (Probes.empty())
+return nullptr;
+  const MCDecodedPseudoProbe &Probe = *Probes.begin();
+  const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+  while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+  return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+  using namespace yaml::bolt;
+  BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+  const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+  using ChildMapTy =
+  std::unordered_map;
+  using CallSiteInfoTy =
+  std::unordered_map;
+  // Mapping from a parent node id to a map InlineSite -> Child node.
+  DenseMap ParentToChildren;
+  // Collect calls in the profile: map from a parent node id to a map
+  // InlineSite -> CallSiteInfo ptr.
+  DenseMap ParentToCSI;
+  for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+  ProbeMatchingStats.TotalCallCount += CSI.Count;
+  ++ProbeMatchingStats.TotalCallSites;
+  if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+  if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  // Get callee GUID
+  if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');
+++ProbeMatchingStats.MissingInlineTree;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  uint64_t CalleeGUID = Callee->InlineTree.front().GUID;
+  ParentToCSI[CSI.InlineTreeNode][InlineSite(CalleeGUID, CSI.Probe)] = 
&CSI;
+}
+  }
+  LLVM_DEBUG({
+for (auto &[ParentId, InlineSiteCSI] : ParentToCSI) {
+  for (auto &[InlineSite, CSI] : InlineSiteCSI) {
+auto [CalleeGUID, CallSite] = InlineSite;
+errs() << ParentId << "@" << CallSite << "->"
+   << Twine::utohexstr(CalleeGUID) << ": " << CSI->Count << ", "
+   << Twine::utohexstr(CSI->Offset) << '\n';
+  }
+}
+  });
+
+  assert(!Root.isRoot());
+  LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@"
+<< Twine::utohexstr(Root.Guid) << " and " << YamlBF.Name
+<< "@" << Twine::utohexstr(FuncNode.GUID) << '\n');
+  ++ProbeMatchingStats.AttemptedNodes;
+  ++ProbeMatchingStats.AttemptedRoots;
+
+  // Match profile function with a lead node (top-level function or inlinee)
+  if (Root.Guid != FuncNode.GUID) {
+LLVM_DEBUG(dbgs() << "

[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)

2025-11-19 Thread via llvm-branch-commits

dyung wrote:

> The same reason as the original bug reporter @iliastsi - trying to fix GHC in 
> Debian.

>From the original bug report, it seems that this has been broken since at 
>least LLVM 17. If this is the case, how have you been building GHC all this 
>time? Were you using LLVM 16?

At this point in the release, we are only accepting patches for recent 
regressions or major issues. Given this issues seems to have been around for 
quite a while (and only seems to be hit in this one case), I am leaning towards 
not merging this change into the 21.x release branch. If you feel strongly that 
it should be merged in, please let us know the rationale and I will discuss it 
with the other release managers to decide whether it should be included at this 
point of the release.

https://github.com/llvm/llvm-project/pull/168380
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)

2025-11-19 Thread Eli Friedman via llvm-branch-commits

https://github.com/efriedma-quic approved this pull request.

LGTM

This is a simple bugfix, and it obviously has no impact on non-ARM64EC targets.

https://github.com/llvm/llvm-project/pull/168371
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)

2025-11-19 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin ready_for_review 
https://github.com/llvm/llvm-project/pull/167399
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/168545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)

2025-11-19 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/168371

Backport 615299934489953deaf202cc445ac9f8ad362afc

Requested by: @cjacek

>From 3da9c16880cdda218f52cbff964f7e1974d012ae Mon Sep 17 00:00:00 2001
From: Jacek Caban 
Date: Tue, 4 Nov 2025 00:04:36 +0100
Subject: [PATCH] [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect
 calls (#165885)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Guest exit thunks serve as glue for performing direct calls, so they
shouldn’t treat the target as an indirect one.

Spotted by @coneco-cy in #165504.

(cherry picked from commit 615299934489953deaf202cc445ac9f8ad362afc)
---
 .../AArch64/AArch64Arm64ECCallLowering.cpp| 14 ++
 llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll  | 49 +--
 2 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
index 509cbb092705d..b4e1e3517fa64 100644
--- a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
@@ -632,16 +632,10 @@ Function 
*AArch64Arm64ECCallLowering::buildGuestExitThunk(Function *F) {
   BasicBlock *BB = BasicBlock::Create(M->getContext(), "", GuestExit);
   IRBuilder<> B(BB);
 
-  // Load the global symbol as a pointer to the check function.
-  Value *GuardFn;
-  if (cfguard_module_flag == 2 && !F->hasFnAttribute("guard_nocf"))
-GuardFn = GuardFnCFGlobal;
-  else
-GuardFn = GuardFnGlobal;
-  LoadInst *GuardCheckLoad = B.CreateLoad(PtrTy, GuardFn);
-
-  // Create new call instruction. The CFGuard check should always be a call,
-  // even if the original CallBase is an Invoke or CallBr instruction.
+  // Create new call instruction. The call check should always be a call,
+  // even if the original CallBase is an Invoke or CallBr instructio.
+  // This is treated as a direct call, so do not use GuardFnCFGlobal.
+  LoadInst *GuardCheckLoad = B.CreateLoad(PtrTy, GuardFnGlobal);
   Function *Thunk = buildExitThunk(F->getFunctionType(), F->getAttributes());
   CallInst *GuardCheck = B.CreateCall(
   GuardFnType, GuardCheckLoad, {F, Thunk});
diff --git a/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll 
b/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
index bdbc99e2d98b0..75e7ac902274d 100644
--- a/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
+++ b/llvm/test/CodeGen/AArch64/cfguard-arm64ec.ll
@@ -2,15 +2,58 @@
 
 declare void @called()
 declare void @escaped()
-define void @f(ptr %dst) {
+define void @f(ptr %dst, ptr readonly %f) {
   call void @called()
+; CHECK: bl  "#called"
   store ptr @escaped, ptr %dst
-  ret void
+  call void %f()
+; CHECK:   adrpx10, $iexit_thunk$cdecl$v$v
+; CHECK-NEXT:  add x10, x10, :lo12:$iexit_thunk$cdecl$v$v
+; CHECK-NEXT:  str x8, [x20]
+; CHECK-NEXT:  adrpx8, __os_arm64x_check_icall_cfg
+; CHECK-NEXT:  ldr x8, [x8, :lo12:__os_arm64x_check_icall_cfg]
+; CHECK-NEXT:  mov x11,
+; CHECK-NEXT:  blr x8
+; CHECK-NEXT:  blr x11
+ret void
 }
 
+; CHECK-LABEL:.def "#called$exit_thunk";
+; CHECK-NEXT: .scl 2;
+; CHECK-NEXT: .type 32;
+; CHECK-NEXT: .endef
+; CHECK-NEXT: .section .wowthk$aa,"xr",discard,"#called$exit_thunk"
+; CHECK-NEXT: .globl "#called$exit_thunk"// -- Begin function 
#called$exit_thunk
+; CHECK-NEXT: .p2align 2
+; CHECK-NEXT: "#called$exit_thunk":   // @"#called$exit_thunk"
+; CHECK-NEXT: .weak_anti_dep called
+; CHECK-NEXT: called = "#called"
+; CHECK-NEXT: .weak_anti_dep "#called"
+; CHECK-NEXT: "#called" = "#called$exit_thunk"
+; CHECK-NEXT:.seh_proc "#called$exit_thunk"
+; CHECK-NEXT: // %bb.0:
+; CHECK-NEXT: str x30, [sp, #-16]!// 8-byte Folded Spill
+; CHECK-NEXT: .seh_save_reg_x x30, 16
+; CHECK-NEXT: .seh_endprologue
+; CHECK-NEXT: adrp x8, __os_arm64x_check_icall
+; CHECK-NEXT: adrp x11, called
+; CHECK-NEXT: add x11, x11, :lo12:called
+; CHECK-NEXT: ldr x8, [x8, :lo12:__os_arm64x_check_icall]
+; CHECK-NEXT: adrp x10, $iexit_thunk$cdecl$v$v
+; CHECK-NEXT: add x10, x10, :lo12:$iexit_thunk$cdecl$v$v
+; CHECK-NEXT: blr x8
+; CHECK-NEXT: .seh_startepilogue
+; CHECK-NEXT: ldr x30, [sp], #16  // 8-byte Folded Reload
+; CHECK-NEXT: .seh_save_reg_x x30, 16
+; CHECK-NEXT: .seh_endepilogue
+; CHECK-NEXT: br x11
+; CHECK-NEXT: .seh_endfunclet
+; CHECK-NEXT: .seh_endproc
+
 !llvm.module.flags = !{!0}
-!0 = !{i32 2, !"cfguard", i32 1}
+!0 = !{i32 2, !"cfguard", i32 2}
 
 ; CHECK-LABEL: .section .gfids$y,"dr"
 ; CHECK-NEXT:  .symidx escaped
+; CHECK-NEXT:  .symidx $iexit_thunk$cdecl$v$v
 ; CHECK-NOT:   .symidx

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commi

[llvm-branch-commits] [llvm] [AtomicExpand] Add bitcasts when expanding load atomic vector (PR #148900)

2025-11-19 Thread via llvm-branch-commits

jofrn wrote:

Moved the .td records required for this PR back to prior PRs 
https://github.com/llvm/llvm-project/pull/148899 (Cast atomic vec with float -- 
containing the Pats for X86) and 
https://github.com/llvm/llvm-project/pull/165818 (Split -- containing the 
definitions of the PatFrag atomics).

https://github.com/llvm/llvm-project/pull/148900
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -1552,25 +1552,14 @@ Error PrintProgramStats::runOnFunctions(BinaryContext 
&BC) {
 100.0 * BC.Stats.ExactMatchedSampleCount / BC.Stats.StaleSampleCount,
 BC.Stats.ExactMatchedSampleCount, BC.Stats.StaleSampleCount);
 BC.outs() << format(
-"BOLT-INFO: inference found an exact pseudo probe match for %.2f%% of "
+"BOLT-INFO: inference found pseudo probe match for %.2f%% of "

maksfb wrote:

Do we want to print this info if pseudo probes are not present?

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 76.41 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/168290.diff


6 Files Affected:

- (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+12-12) 
- (modified) llvm/test/CodeGen/AArch64/sve-extract-scalable-vector.ll (-7) 
- (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll 
(+133-133) 
- (modified) llvm/test/CodeGen/X86/half.ll (+64-69) 
- (modified) llvm/test/CodeGen/X86/matrix-multiply.ll (+38-36) 
- (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll 
(+216-218) 


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
   // Widen the input and call convert on the widened input vector.
   unsigned NumConcat =
   WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
-  SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+  SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
   Ops[0] = InOp;
   SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
   if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
 
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
-  SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
   // Use the original element count so we don't do more scalar opts than
   // necessary.
   unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
   std::array EltVTs = {{EltVT, MVT::Other}};
-  SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
   SmallVector OpChains;
   // Use the original element count so we don't do more scalar opts than
   // necessary.
@@ -5819,7 +5819,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
   }
 
   while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
 
   return DAG.getBuildVector(WidenVT, DL, Ops);
 }
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
 // input and then widening it. To avoid this, we widen the input only 
if
 // it results in a legal type.
 if (WidenSize % InSize == 0) {
-  SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+  SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
   Ops[0] = InOp;
 
   NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
   SmallVector Ops;
   DAG.ExtractVectorElements(InOp, Ops);
   Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
 
   NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
 }
@@ -6088,7 +6088,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 if (WidenNumElts % NumInElts == 0) {
   // Add undef vectors to widen to correct length.
   unsigned NumConcat = WidenNumElts / NumInElts;
-  SDValue UndefVal = DAG.getUNDEF(InVT);
+  SDValue UndefVal = DAG.getPOISON(InVT);
   SmallVector Ops(NumConcat);
   for (unsigned i=0; i < NumOperands; ++i)
 Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 for (unsigned j = 0; j < NumInElts; ++j)
   Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
   }
-  SDValue UndefVal = DAG.getUNDEF(EltVT);
+  SDValue UndefVal = DAG.getPOISON(EltVT);
   for (; Idx < WidenNumElts; ++Idx)
 Ops[Idx] = UndefVal;
   return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
 Parts.push_back(
 DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
   for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
 
   return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
 }
@@ -6229,7 +6229,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(S

[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/168290

None

>From 2f389b76f03f8e266e18eaef26bfc96e75a65ba7 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 14 Nov 2025 21:47:44 -0800
Subject: [PATCH] DAG: Use poison for some vector result widening

---
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  24 +-
 .../AArch64/sve-extract-scalable-vector.ll|   7 -
 .../vector-constrained-fp-intrinsics.ll   | 266 +--
 llvm/test/CodeGen/X86/half.ll | 133 +++---
 llvm/test/CodeGen/X86/matrix-multiply.ll  |  74 +--
 .../X86/vector-constrained-fp-intrinsics.ll   | 434 +-
 6 files changed, 463 insertions(+), 475 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
   // Widen the input and call convert on the widened input vector.
   unsigned NumConcat =
   WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
-  SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+  SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
   Ops[0] = InOp;
   SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
   if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
 
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
-  SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
   // Use the original element count so we don't do more scalar opts than
   // necessary.
   unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
   std::array EltVTs = {{EltVT, MVT::Other}};
-  SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
   SmallVector OpChains;
   // Use the original element count so we don't do more scalar opts than
   // necessary.
@@ -5819,7 +5819,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
   }
 
   while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
 
   return DAG.getBuildVector(WidenVT, DL, Ops);
 }
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
 // input and then widening it. To avoid this, we widen the input only 
if
 // it results in a legal type.
 if (WidenSize % InSize == 0) {
-  SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+  SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
   Ops[0] = InOp;
 
   NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
   SmallVector Ops;
   DAG.ExtractVectorElements(InOp, Ops);
   Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
 
   NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
 }
@@ -6088,7 +6088,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 if (WidenNumElts % NumInElts == 0) {
   // Add undef vectors to widen to correct length.
   unsigned NumConcat = WidenNumElts / NumInElts;
-  SDValue UndefVal = DAG.getUNDEF(InVT);
+  SDValue UndefVal = DAG.getPOISON(InVT);
   SmallVector Ops(NumConcat);
   for (unsigned i=0; i < NumOperands; ++i)
 Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 for (unsigned j = 0; j < NumInElts; ++j)
   Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
   }
-  SDValue UndefVal = DAG.getUNDEF(EltVT);
+  SDValue UndefVal = DAG.getPOISON(EltVT);
   for (; Idx < WidenNumElts; ++Idx)
 Ops[Idx] = UndefVal;
   return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
 Parts.push_back(
 DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
   for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
 
   return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
 }
@@ -6229,7 +6229,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVEC

[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)

2025-11-19 Thread Kai Nacke via llvm-branch-commits

https://github.com/redstar updated 
https://github.com/llvm/llvm-project/pull/167054

>From 1f9bfbbd5e893bcab320dc26c71e49779ef7d04d Mon Sep 17 00:00:00 2001
From: Kai Nacke 
Date: Fri, 7 Nov 2025 11:13:49 -0500
Subject: [PATCH 1/2] [GOFF] Write out relocations in the GOFF writer

Add support for writing relocations. Since the symbol numbering is only
available after the symbols are written, the relocations are collected
in a vector. At write time, the relocations are converted using the
symbols ids, compressed and written out. A relocation data record is
limited to 32K-1 bytes, which requires making sure that larger relocation
data is written into multiple records.
---
 llvm/include/llvm/BinaryFormat/GOFF.h |  26 ++
 llvm/include/llvm/MC/MCGOFFObjectWriter.h |  37 ++-
 llvm/lib/MC/GOFFObjectWriter.cpp  | 266 +-
 .../MCTargetDesc/SystemZGOFFObjectWriter.cpp  |  24 ++
 .../SystemZ/MCTargetDesc/SystemZMCAsmInfo.h   |   1 +
 llvm/test/CodeGen/SystemZ/zos-section-1.ll|  23 +-
 llvm/test/CodeGen/SystemZ/zos-section-2.ll|  16 +-
 7 files changed, 378 insertions(+), 15 deletions(-)

diff --git a/llvm/include/llvm/BinaryFormat/GOFF.h 
b/llvm/include/llvm/BinaryFormat/GOFF.h
index 49d2809cb6524..08bdb5d624fca 100644
--- a/llvm/include/llvm/BinaryFormat/GOFF.h
+++ b/llvm/include/llvm/BinaryFormat/GOFF.h
@@ -157,6 +157,32 @@ enum ESDAlignment : uint8_t {
   ESD_ALIGN_4Kpage = 12,
 };
 
+enum RLDReferenceType : uint8_t {
+  RLD_RT_RAddress = 0,
+  RLD_RT_ROffset = 1,
+  RLD_RT_RLength = 2,
+  RLD_RT_RRelativeImmediate = 6,
+  RLD_RT_RTypeConstant = 7,
+  RLD_RT_RLongDisplacement = 9,
+};
+
+enum RLDReferentType : uint8_t {
+  RLD_RO_Label = 0,
+  RLD_RO_Element = 1,
+  RLD_RO_Class = 2,
+  RLD_RO_Part = 3,
+};
+
+enum RLDAction : uint8_t {
+  RLD_ACT_Add = 0,
+  RLD_ACT_Subtract = 1,
+};
+
+enum RLDFetchStore : uint8_t {
+  RLD_FS_Fetch = 0,
+  RLD_FS_Store = 1
+};
+
 enum ENDEntryPointRequest : uint8_t {
   END_EPR_None = 0,
   END_EPR_EsdidOffset = 1,
diff --git a/llvm/include/llvm/MC/MCGOFFObjectWriter.h 
b/llvm/include/llvm/MC/MCGOFFObjectWriter.h
index ec07637dd2847..408d432a8f54f 100644
--- a/llvm/include/llvm/MC/MCGOFFObjectWriter.h
+++ b/llvm/include/llvm/MC/MCGOFFObjectWriter.h
@@ -11,9 +11,13 @@
 
 #include "llvm/MC/MCObjectWriter.h"
 #include "llvm/MC/MCValue.h"
+#include 
+#include 
 
 namespace llvm {
 class MCObjectWriter;
+class MCSectionGOFF;
+class MCSymbolGOFF;
 class raw_pwrite_stream;
 
 class MCGOFFObjectTargetWriter : public MCObjectTargetWriter {
@@ -21,8 +25,19 @@ class MCGOFFObjectTargetWriter : public MCObjectTargetWriter 
{
   MCGOFFObjectTargetWriter() = default;
 
 public:
+  enum RLDRelocationType {
+Reloc_Type_ACon = 0x1,   // General address.
+Reloc_Type_RelImm = 0x2, // Relative-immediate address.
+Reloc_Type_QCon = 0x3,   // Offset of symbol in class.
+Reloc_Type_VCon = 0x4,   // Address of external symbol.
+Reloc_Type_RCon = 0x5,   // PSECT of symbol.
+  };
+
   ~MCGOFFObjectTargetWriter() override = default;
 
+  virtual unsigned getRelocType(const MCValue &Target, const MCFixup &Fixup,
+bool IsPCRel) const = 0;
+
   Triple::ObjectFormatType getFormat() const override { return Triple::GOFF; }
 
   static bool classof(const MCObjectTargetWriter *W) {
@@ -30,6 +45,23 @@ class MCGOFFObjectTargetWriter : public MCObjectTargetWriter 
{
   }
 };
 
+struct GOFFSavedRelocationEntry {
+  const MCSectionGOFF *Section;
+  const MCSymbolGOFF *SymA;
+  const MCSymbolGOFF *SymB;
+  unsigned RelocType;
+  uint64_t FixupOffset;
+  uint32_t Length;
+  uint64_t FixedValue; // Info only.
+
+  GOFFSavedRelocationEntry(const MCSectionGOFF *Section,
+   const MCSymbolGOFF *SymA, const MCSymbolGOFF *SymB,
+   unsigned RelocType, uint64_t FixupOffset,
+   uint32_t Length, uint64_t FixedValue)
+  : Section(Section), SymA(SymA), SymB(SymB), RelocType(RelocType),
+FixupOffset(FixupOffset), Length(Length), FixedValue(FixedValue) {}
+};
+
 class GOFFObjectWriter : public MCObjectWriter {
   // The target specific GOFF writer instance.
   std::unique_ptr TargetObjectWriter;
@@ -37,6 +69,9 @@ class GOFFObjectWriter : public MCObjectWriter {
   // The stream used to write the GOFF records.
   raw_pwrite_stream &OS;
 
+  // Saved relocation data.
+  std::vector SavedRelocs;
+
 public:
   GOFFObjectWriter(std::unique_ptr MOTW,
raw_pwrite_stream &OS);
@@ -44,7 +79,7 @@ class GOFFObjectWriter : public MCObjectWriter {
 
   // Implementation of the MCObjectWriter interface.
   void recordRelocation(const MCFragment &F, const MCFixup &Fixup,
-MCValue Target, uint64_t &FixedValue) override {}
+MCValue Target, uint64_t &FixedValue) override;
 
   uint64_t writeObject() override;
 };
diff --git a/llvm/lib/MC/GOFFObjectWriter.cpp b/llvm/lib/MC/GOFFObjectWriter.cpp
index 07aec

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic updated 
https://github.com/llvm/llvm-project/pull/168411

>From 73f2bf84bb2bcff3cd20aa207116f214cde943f9 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 17 Nov 2025 18:47:58 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and
 G_FNEG

---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp|  26 +-
 .../AMDGPU/AMDGPURegBankLegalizeHelper.h  |   1 +
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  19 +
 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll   | 340 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll   | 303 
 5 files changed, 683 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..123fc5bf37a19 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -437,6 +437,13 @@ std::pair 
RegBankLegalizeHelper::unpackAExt(Register Reg) {
   return {Lo.getReg(0), Hi.getReg(0)};
 }
 
+std::pair
+RegBankLegalizeHelper::unpackAExtTruncS16(Register Reg) {
+  auto [Lo32, Hi32] = unpackAExt(Reg);
+  return {B.buildTrunc(SgprRB_S16, Lo32).getReg(0),
+  B.buildTrunc(SgprRB_S16, Hi32).getReg(0)};
+}
+
 void RegBankLegalizeHelper::lowerUnpackBitShift(MachineInstr &MI) {
   Register Lo, Hi;
   switch (MI.getOpcode()) {
@@ -629,14 +636,21 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr 
&MI) {
 void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   assert(MRI.getType(Dst) == V2S16);
-  auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
-  auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
   unsigned Opc = MI.getOpcode();
   auto Flags = MI.getFlags();
-  auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
-  auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
-  auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
-  auto Op2Hi = B.buildTrunc(SgprRB_S16, Op2Hi32);
+
+  if (MI.getNumOperands() == 2) {
+auto [Op1Lo, Op1Hi] = unpackAExtTruncS16(MI.getOperand(1).getReg());
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+  }
+
+  assert(MI.getNumOperands() == 3);
+  auto [Op1Lo, Op1Hi] = unpackAExtTruncS16(MI.getOperand(1).getReg());
+  auto [Op2Lo, Op2Hi] = unpackAExtTruncS16(MI.getOperand(2).getReg());
   auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo, Op2Lo}, Flags);
   auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi, Op2Hi}, Flags);
   B.buildMergeLikeInstr(Dst, {Lo, Hi});
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
index e7598f888e4b5..4f1c3c02fa5d6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.h
@@ -118,6 +118,7 @@ class RegBankLegalizeHelper {
   std::pair unpackZExt(Register Reg);
   std::pair unpackSExt(Register Reg);
   std::pair unpackAExt(Register Reg);
+  std::pair unpackAExtTruncS16(Register Reg);
   void lowerUnpackBitShift(MachineInstr &MI);
   void lowerV_BFE(MachineInstr &MI);
   void lowerS_BFE(MachineInstr &MI);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const 
GCNSubtarget &_ST,
   .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
   .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
 
+  // FNEG and FABS are either folded as source modifiers or can be selected as
+  // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+  // targets without SALU float we still select them as VGPR since there would
+  // be no real sgpr use.
+  addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+  .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+  .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+  .Div(S16, {{Vgpr16}, {Vgpr16}})
+  .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+  .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+  .Div(S32, {{Vgpr32}, {Vgpr32}})
+  .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+  .Div(S64, {{Vgpr64}, {Vgpr64}})
+  .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+  .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+  .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+  .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+  .Any({{DivV2S32}, {{

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread Petar Avramovic via llvm-branch-commits


@@ -0,0 +1,340 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck -check-prefixes=GCN,GFX11 
%s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck -check-prefixes=GCN,GFX12 
%s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN:   ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b16 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+  %fabs = call half @llvm.fabs.f16(half %in)
+  store half %fabs, ptr addrspace(1) %out
+  ret void
+}
+define amdgpu_ps void @s_fabs_f16(half inreg %in, ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f16:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:global_store_b16 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f16:
+; GFX12:   ; %bb.0:
+; GFX12-NEXT:s_and_b32 s0, s0, 0x7fff
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b16 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+  %fabs = call half @llvm.fabs.f16(half %in)
+  store half %fabs, ptr addrspace(1) %out
+  ret void
+}
+define amdgpu_ps void @s_fabs_f16_salu_use(half inreg %in, i32 inreg %val, ptr 
addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f16_salu_use:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:s_cmp_eq_u32 s1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | 
instid1(SALU_CYCLE_1)
+; GFX11-NEXT:v_readfirstlane_b32 s0, v2
+; GFX11-NEXT:s_cselect_b32 s0, s0, 0
+; GFX11-NEXT:v_mov_b32_e32 v2, s0
+; GFX11-NEXT:global_store_b16 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f16_salu_use:
+; GFX12:   ; %bb.0:
+; GFX12-NEXT:s_and_b32 s0, s0, 0x7fff
+; GFX12-NEXT:s_cmp_eq_u32 s1, 0
+; GFX12-NEXT:s_cselect_b32 s0, s0, 0
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b16 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+  %fabs = call half @llvm.fabs.f16(half %in)
+  %cond = icmp eq i32 %val, 0
+  %sel = select i1 %cond, half %fabs, half 0.0
+  store half %sel, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_ps void @v_fabs_f32(float %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f32:
+; GCN:   ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b32 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+  %fabs = call float @llvm.fabs.f32(float %in)
+  store float %fabs, ptr addrspace(1) %out
+  ret void
+}
+define amdgpu_ps void @s_fabs_f32(float inreg %in, ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f32:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:global_store_b32 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f32:
+; GFX12:   ; %bb.0:
+; GFX12-NEXT:s_bitset0_b32 s0, 31
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b32 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+  %fabs = call float @llvm.fabs.f32(float %in)
+  store float %fabs, ptr addrspace(1) %out
+  ret void
+}
+define amdgpu_ps void @s_fabs_f32_salu_use(float inreg %in, i32 inreg %val, 
ptr addrspace(1) %out) {
+; GFX11-LABEL: s_fabs_f32_salu_use:
+; GFX11:   ; %bb.0:
+; GFX11-NEXT:v_and_b32_e64 v2, 0x7fff, s0
+; GFX11-NEXT:s_cmp_eq_u32 s1, 0
+; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(SKIP_1) | 
instid1(SALU_CYCLE_1)
+; GFX11-NEXT:v_readfirstlane_b32 s0, v2
+; GFX11-NEXT:s_cselect_b32 s0, s0, 0
+; GFX11-NEXT:v_mov_b32_e32 v2, s0
+; GFX11-NEXT:global_store_b32 v[0:1], v2, off
+; GFX11-NEXT:s_endpgm
+;
+; GFX12-LABEL: s_fabs_f32_salu_use:
+; GFX12:   ; %bb.0:
+; GFX12-NEXT:s_bitset0_b32 s0, 31
+; GFX12-NEXT:s_cmp_eq_u32 s1, 0
+; GFX12-NEXT:s_cselect_b32 s0, s0, 0
+; GFX12-NEXT:s_delay_alu instid0(SALU_CYCLE_1)
+; GFX12-NEXT:v_mov_b32_e32 v2, s0
+; GFX12-NEXT:global_store_b32 v[0:1], v2, off
+; GFX12-NEXT:s_endpgm
+  %fabs = call float @llvm.fabs.f32(float %in)
+  %cond = icmp eq i32 %val, 0
+  %sel = select i1 %cond, float %fabs, float 0.0
+  store float %sel, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_ps void @v_fabs_f64(double %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f64:
+; GCN:   ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v1, 0x7fff, v1
+; GCN-NEXT:global_store_b64 v[2:3], v[0:1], off
+; GCN-NEXT:s_endpgm
+  %fabs = call double @llvm.fabs.f64(double %in)
+  store double %fabs, ptr addrspace(1) %out
+  ret void
+}
+define amdgpu_ps void @s_fabs_f64(double inreg %in, ptr addrspace(1) %out) {
+; G

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Petar Avramovic (petar-avramovic)


Changes



---

Patch is 21.42 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/168411.diff


4 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp (+15-2) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp (+19) 
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll (+233) 
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll (+216) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..d719f3d40295d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -629,10 +629,23 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr 
&MI) {
 void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   assert(MRI.getType(Dst) == V2S16);
-  auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
-  auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
   unsigned Opc = MI.getOpcode();
   auto Flags = MI.getFlags();
+
+  if (MI.getNumOperands() == 2) {
+auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
+auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+  }
+
+  assert(MI.getNumOperands() == 3);
+  auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+  auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
   auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
   auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
   auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const 
GCNSubtarget &_ST,
   .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
   .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
 
+  // FNEG and FABS are either folded as source modifiers or can be selected as
+  // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+  // targets without SALU float we still select them as VGPR since there would
+  // be no real sgpr use.
+  addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+  .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+  .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+  .Div(S16, {{Vgpr16}, {Vgpr16}})
+  .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+  .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+  .Div(S32, {{Vgpr32}, {Vgpr32}})
+  .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+  .Div(S64, {{Vgpr64}, {Vgpr64}})
+  .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+  .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+  .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+  .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+  .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32}}});
+
   addRulesForGOpcs({G_FPTOUI})
   .Any({{UniS32, S32}, {{Sgpr32}, {Sgpr32}}}, hasSALUFloat)
   .Any({{UniS32, S32}, {{UniInVgprS32}, {Vgpr32}}}, !hasSALUFloat);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
new file mode 100644
index 0..093cdf744e3b4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
@@ -0,0 +1,233 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1100 -o - 
%s | FileCheck -check-prefixes=GCN,GFX11,GFX11-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck 
-check-prefixes=GCN,GFX11,GFX11-GISEL %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1200 -o - 
%s | FileCheck -check-prefixes=GCN,GFX12,GFX12-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck 
-check-prefixes=GCN,GFX12,GFX12-GISEL %s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN:   ; %bb.0:
+; GCN-NEXT:v_and_b32_e32 v0, 0x7fff, v0
+; GCN-NEXT:global_store_b16 v[1:2], v0, off
+; GCN-NEXT:s_endpgm
+  %fabs = call half @llvm.fabs.f16(half %in)
+  store half %fabs, ptr addrspace(1) %out
+  ret void
+}
+de

[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/168290

>From 6b6155931582b2f8924a76b268f06d9e2696d489 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 14 Nov 2025 21:47:44 -0800
Subject: [PATCH] DAG: Use poison for some vector result widening

---
 .../SelectionDAG/LegalizeVectorTypes.cpp  |  24 +-
 .../AArch64/sve-extract-scalable-vector.ll|   7 -
 .../vector-constrained-fp-intrinsics.ll   | 266 +--
 llvm/test/CodeGen/X86/matrix-multiply.ll  |  74 +--
 .../X86/vector-constrained-fp-intrinsics.ll   | 434 +-
 5 files changed, 399 insertions(+), 406 deletions(-)

diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
   // Widen the input and call convert on the widened input vector.
   unsigned NumConcat =
   WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
-  SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+  SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
   Ops[0] = InOp;
   SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
   if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
 
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
-  SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
   // Use the original element count so we don't do more scalar opts than
   // necessary.
   unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
   std::array EltVTs = {{EltVT, MVT::Other}};
-  SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
   SmallVector OpChains;
   // Use the original element count so we don't do more scalar opts than
   // necessary.
@@ -5819,7 +5819,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
   }
 
   while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
 
   return DAG.getBuildVector(WidenVT, DL, Ops);
 }
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
 // input and then widening it. To avoid this, we widen the input only 
if
 // it results in a legal type.
 if (WidenSize % InSize == 0) {
-  SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+  SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
   Ops[0] = InOp;
 
   NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
   SmallVector Ops;
   DAG.ExtractVectorElements(InOp, Ops);
   Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
 
   NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
 }
@@ -6088,7 +6088,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 if (WidenNumElts % NumInElts == 0) {
   // Add undef vectors to widen to correct length.
   unsigned NumConcat = WidenNumElts / NumInElts;
-  SDValue UndefVal = DAG.getUNDEF(InVT);
+  SDValue UndefVal = DAG.getPOISON(InVT);
   SmallVector Ops(NumConcat);
   for (unsigned i=0; i < NumOperands; ++i)
 Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 for (unsigned j = 0; j < NumInElts; ++j)
   Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
   }
-  SDValue UndefVal = DAG.getUNDEF(EltVT);
+  SDValue UndefVal = DAG.getPOISON(EltVT);
   for (; Idx < WidenNumElts; ++Idx)
 Ops[Idx] = UndefVal;
   return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
 Parts.push_back(
 DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
   for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
 
   return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
 }
@@ -6229,7 +6229,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
   for (i = 0; i < VTNumElts; ++i)
 Ops[i] = 

[llvm-branch-commits] [llvm] [AArch64][SME] Support saving/restoring ZT0 in the MachineSMEABIPass (PR #166362)

2025-11-19 Thread Benjamin Maxwell via llvm-branch-commits

https://github.com/MacDue updated 
https://github.com/llvm/llvm-project/pull/166362

>From 61a5390345e13e8195ad9b2214133914db560ef2 Mon Sep 17 00:00:00 2001
From: Benjamin Maxwell 
Date: Mon, 3 Nov 2025 15:41:49 +
Subject: [PATCH] [AArch64][SME] Support saving/restoring ZT0 in the
 MachineSMEABIPass
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch extends the MachineSMEABIPass to support ZT0. This is done
with the addition of two new states:

  - `ACTIVE_ZT0_SAVED`
* This is used when calling a function that shares ZA, but does
  share ZT0 (i.e., no ZT0 attributes).
* This state indicates ZT0 must be saved to the save slot, but
  must remain on, with no lazy save setup
  - `LOCAL_COMMITTED`
* This is used for saving ZT0 in functions without ZA state.
* This state indicates ZA is off and ZT0 has been saved.
* This state is general enough to support ZA, but those
  have not been implemented†

To aid with readability, the state transitions have been reworked to a
switch of `transitionFrom().to()`, rather than
nested ifs, which helps manage more transitions.

† This could be implemented to handle some cases of undefined behavior
  better.

Change-Id: I14be4a7f8b998fe667bfaade5088f88039515f91
---
 .../AArch64/AArch64ExpandPseudoInsts.cpp  |   1 +
 .../Target/AArch64/AArch64ISelLowering.cpp|  11 +-
 .../lib/Target/AArch64/AArch64SMEInstrInfo.td |   6 +
 llvm/lib/Target/AArch64/MachineSMEABIPass.cpp | 176 +++---
 .../test/CodeGen/AArch64/sme-peephole-opts.ll |   4 -
 .../test/CodeGen/AArch64/sme-za-exceptions.ll | 124 +---
 llvm/test/CodeGen/AArch64/sme-zt0-state.ll| 104 ++-
 7 files changed, 321 insertions(+), 105 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp 
b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
index 34d74d04c4419..60e6a82d41cc8 100644
--- a/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp
@@ -1717,6 +1717,7 @@ bool AArch64ExpandPseudo::expandMI(MachineBasicBlock &MBB,
}
case AArch64::InOutZAUsePseudo:
case AArch64::RequiresZASavePseudo:
+   case AArch64::RequiresZT0SavePseudo:
case AArch64::SMEStateAllocPseudo:
case AArch64::COALESCER_BARRIER_FPR16:
case AArch64::COALESCER_BARRIER_FPR32:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index c4ae8ea7a8a69..6dc01597cf0f5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -9524,6 +9524,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 if (CallAttrs.requiresLazySave() ||
 CallAttrs.requiresPreservingAllZAState())
   ZAMarkerNode = AArch64ISD::REQUIRES_ZA_SAVE;
+else if (CallAttrs.requiresPreservingZT0())
+  ZAMarkerNode = AArch64ISD::REQUIRES_ZT0_SAVE;
 else if (CallAttrs.caller().hasZAState() ||
  CallAttrs.caller().hasZT0State())
   ZAMarkerNode = AArch64ISD::INOUT_ZA_USE;
@@ -9643,7 +9645,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   SDValue ZTFrameIdx;
   MachineFrameInfo &MFI = MF.getFrameInfo();
-  bool ShouldPreserveZT0 = CallAttrs.requiresPreservingZT0();
+  bool ShouldPreserveZT0 =
+  !UseNewSMEABILowering && CallAttrs.requiresPreservingZT0();
 
   // If the caller has ZT0 state which will not be preserved by the callee,
   // spill ZT0 before the call.
@@ -9656,7 +9659,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 
   // If caller shares ZT0 but the callee is not shared ZA, we need to stop
   // PSTATE.ZA before the call if there is no lazy-save active.
-  bool DisableZA = CallAttrs.requiresDisablingZABeforeCall();
+  bool DisableZA =
+  !UseNewSMEABILowering && CallAttrs.requiresDisablingZABeforeCall();
   assert((!DisableZA || !RequiresLazySave) &&
  "Lazy-save should have PSTATE.SM=1 on entry to the function");
 
@@ -10142,7 +10146,8 @@ AArch64TargetLowering::LowerCall(CallLoweringInfo &CLI,
 getSMToggleCondition(CallAttrs));
   }
 
-  if (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall())
+  if (!UseNewSMEABILowering &&
+  (RequiresLazySave || CallAttrs.requiresEnablingZAAfterCall()))
 // Unconditionally resume ZA.
 Result = DAG.getNode(
 AArch64ISD::SMSTART, DL, DAG.getVTList(MVT::Other, MVT::Glue), Result,
diff --git a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td 
b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
index 737169253ddb3..b099f15ecf7e3 100644
--- a/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
+++ b/llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
@@ -102,6 +102,7 @@ def : Pat<(i64 (AArch64AllocateSMESaveBuffer GPR64:$size)),
 let hasSideEffects = 1, isMeta = 1 in {
   def InOutZAUsePseudo : Pseudo<(outs), (ins), []>, Sched<[]>;
   def RequiresZASavePseudo : Pseudo<(outs), (ins), []>, Sched<[]>;
+  d

[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -549,15 +473,16 @@ createFlowFunction(const 
BinaryFunction::BasicBlockOrderType &BlockOrder) {
 /// of the basic blocks in the binary, the count is "matched" to the block.
 /// Similarly, if both the source and the target of a count in the profile are
 /// matched to a jump in the binary, the count is recorded in CFG.
-size_t matchWeights(
-BinaryContext &BC, const BinaryFunction::BasicBlockOrderType &BlockOrder,
-const yaml::bolt::BinaryFunctionProfile &YamlBF, FlowFunction &Func,
-HashFunction HashFunction, YAMLProfileReader::ProfileLookupMap &IdToYamlBF,
-const BinaryFunction &BF,
-const ArrayRef ProbeMatchSpecs);
+size_t matchWeights(BinaryContext &BC,

maksfb wrote:

`BinaryContext` can be obtained from `BinaryFunction`.

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][PAC] Warn about synchronous unwind tables (PR #165227)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

https://github.com/bgergely0 updated 
https://github.com/llvm/llvm-project/pull/165227

From 61e03b5abf74bd5a61f2aa4d21219c67cfbfce24 Mon Sep 17 00:00:00 2001
From: Gergely Balint 
Date: Mon, 27 Oct 2025 09:29:54 +
Subject: [PATCH 1/4] [BOLT][PAC] Warn about synchronous unwind tables

BOLT currently ignores functions with synchronous PAuth DWARF info.
When more than 10% of functions get ignored for inconsistencies, we
should emit a warning to only use asynchronous unwind tables.

See also: #165215
---
 bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp|  9 -
 .../AArch64/pacret-synchronous-unwind.cpp | 33 +++
 2 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp

diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp 
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 91030544d2b88..01af88818a21d 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -133,11 +133,18 @@ Error 
PointerAuthCFIAnalyzer::runOnFunctions(BinaryContext &BC) {
   ParallelUtilities::runOnEachFunction(
   BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
   SkipPredicate, "PointerAuthCFIAnalyzer");
+
+  float IgnoredPercent = (100.0 * FunctionsIgnored) / Total;
   BC.outs() << "BOLT-INFO: PointerAuthCFIAnalyzer ran on " << Total
 << " functions. Ignored " << FunctionsIgnored << " functions "
-<< format("(%.2lf%%)", (100.0 * FunctionsIgnored) / Total)
+<< format("(%.2lf%%)", IgnoredPercent)
 << " because of CFI inconsistencies\n";
 
+  if (IgnoredPercent >= 10.0)
+BC.outs() << "BOLT-WARNING: PointerAuthCFIAnalyzer only supports "
+ "asynchronous unwind tables. For C compilers, see "
+ "-fasynchronous-unwind-tables.\n";
+
   return Error::success();
 }
 
diff --git a/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp 
b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
new file mode 100644
index 0..1bfeeaed3715a
--- /dev/null
+++ b/bolt/test/runtime/AArch64/pacret-synchronous-unwind.cpp
@@ -0,0 +1,33 @@
+// Test to demonstrate that functions compiled with synchronous unwind tables
+// are ignored by the PointerAuthCFIAnalyzer.
+// Exception handling is needed to have _any_ unwind tables, otherwise the
+// PointerAuthCFIAnalyzer does not run on these functions, so it does not 
ignore
+// any function.
+//
+// REQUIRES: system-linux,bolt-runtime
+//
+// RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+// RUN: -mbranch-protection=pac-ret \
+// RUN: -fno-asynchronous-unwind-tables \
+// RUN: %s -o %t.exe -Wl,-q
+// RUN: llvm-bolt %t.exe -o %t.bolt | FileCheck %s --check-prefix=CHECK
+//
+// CHECK: PointerAuthCFIAnalyzer ran on 3 functions. Ignored
+// CHECK-NOT: 0 functions (0.00%) because of CFI inconsistencies
+// CHECK-SAME: 1 functions (33.33%) because of CFI inconsistencies
+// CHECK-NEXT: BOLT-WARNING: PointerAuthCFIAnalyzer only supports asynchronous
+// CHECK-SAME: unwind tables. For C compilers, see 
-fasynchronous-unwind-tables.
+
+#include 
+#include 
+
+void foo() { throw std::runtime_error("Exception from foo()."); }
+
+int main() {
+  try {
+foo();
+  } catch (const std::exception &e) {
+printf("Exception caught: %s\n", e.what());
+  }
+  return 0;
+}

From 7fc8acdbf4cef2aa7f4f5ca9d136d4cb1bce9fe6 Mon Sep 17 00:00:00 2001
From: Gergely Balint 
Date: Tue, 28 Oct 2025 09:23:08 +
Subject: [PATCH 2/4] [BOLT] Use opts::Verbosity in PointerAuthCFIAnalyzer

---
 bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp | 27 ++
 bolt/test/AArch64/pacret-cfi-incorrect.s   |  2 +-
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp 
b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
index 01af88818a21d..5979d5fb01818 100644
--- a/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
+++ b/bolt/lib/Passes/PointerAuthCFIAnalyzer.cpp
@@ -28,6 +28,10 @@
 
 using namespace llvm;
 
+namespace opts {
+extern llvm::cl::opt Verbosity;
+} // namespace opts
+
 namespace llvm {
 namespace bolt {
 
@@ -43,9 +47,10 @@ bool PointerAuthCFIAnalyzer::runOnFunction(BinaryFunction 
&BF) {
 // Not all functions have .cfi_negate_ra_state in them. But if one 
does,
 // we expect psign/pauth instructions to have the hasNegateRAState
 // annotation.
-BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
-  << BF.getPrintName()
-  << ": ptr sign/auth inst without .cfi_negate_ra_state\n";
+if (opts::Verbosity >= 1)
+  BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+<< BF.getPrintName()
+<< ": ptr sign/auth inst without .cfi_negate_ra_state\n";
 std::lock_guard Lock(IgnoreMutex);
 BF.setIgnored();
 return false;
@@ -65,9 +70,10 @@ bool PointerAuthCFIAnalyzer::runOnF

[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)

2025-11-19 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/168380
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();
+  auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+  if (Probes.empty())
+return nullptr;
+  const MCDecodedPseudoProbe &Probe = *Probes.begin();
+  const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+  while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+  return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+  using namespace yaml::bolt;
+  BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+  const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+  using ChildMapTy =
+  std::unordered_map;
+  using CallSiteInfoTy =
+  std::unordered_map;
+  // Mapping from a parent node id to a map InlineSite -> Child node.
+  DenseMap ParentToChildren;
+  // Collect calls in the profile: map from a parent node id to a map
+  // InlineSite -> CallSiteInfo ptr.
+  DenseMap ParentToCSI;
+  for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+  ProbeMatchingStats.TotalCallCount += CSI.Count;
+  ++ProbeMatchingStats.TotalCallSites;
+  if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+  if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  // Get callee GUID
+  if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');

maksfb wrote:

```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no inline tree for " << Callee->Name 
<< '\n');
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)

2025-11-19 Thread Pierre van Houtryve via llvm-branch-commits


@@ -1438,8 +1443,7 @@ bool 
SIGfx6CacheControl::insertRelease(MachineBasicBlock::iterator &MI,
 }
 
 bool SIGfx10CacheControl::enableLoadCacheBypass(
-const MachineBasicBlock::iterator &MI,
-SIAtomicScope Scope,
+const MachineBasicBlock::iterator &MI, SIAtomicScope Scope,

Pierre-vh wrote:

I didn't, I always use `git clang-format` so not sure why that changed. Would 
you like me to remove it?

https://github.com/llvm/llvm-project/pull/168058
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-11-19 Thread Juan Manuel Martinez Caamaño via llvm-branch-commits

https://github.com/jmmartinez approved this pull request.


https://github.com/llvm/llvm-project/pull/161815
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Dan Blackwell via llvm-branch-commits


@@ -5,29 +5,34 @@
 // UNSUPPORTED: ios
 
 // RUN: rm -rf %t && mkdir -p %t
-// RUN: cp `%clang_asan 
-print-file-name=lib`/darwin/libclang_rt.asan_osx_dynamic.dylib \
+// RUN: %clang_asan -print-file-name=lib | tr -d '\n' > %t.lib_name
+// RUN: cp %{readfile:%t.lib_name}/darwin/libclang_rt.asan_osx_dynamic.dylib \
 // RUN:   %t/libclang_rt.asan_osx_dynamic.dylib
 
 // RUN: %clangxx_asan %s -o %t/a.out
 // RUN: %clangxx -DSHARED_LIB %s \
 // RUN: -dynamiclib -o %t/dummy-so.dylib
 
-// RUN: ( cd %t && \
-// RUN:   
DYLD_INSERT_LIBRARIES=@executable_path/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib
 \
-// RUN:   %run ./a.out 2>&1 ) | FileCheck %s || exit 1
-
-// RUN: ( cd %t && \
-// RUN:   
DYLD_INSERT_LIBRARIES=libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN:   %run ./a.out 2>&1 ) | FileCheck %s || exit 1
-
-// RUN: ( cd %t && \
-// RUN:   %env_asan_opts=strip_env=0 \
-// RUN:   
DYLD_INSERT_LIBRARIES=libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN:   %run ./a.out 2>&1 ) | FileCheck %s --check-prefix=CHECK-KEEP || exit 
1
-
-// RUN: ( cd %t && \
-// RUN:   
DYLD_INSERT_LIBRARIES=%t/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib \
-// RUN:   %run ./a.out 2>&1 ) | FileCheck %s || exit 1
+// RUN: pushd %t
+// RUN: env 
DYLD_INSERT_LIBRARIES=@executable_path/libclang_rt.asan_osx_dynamic.dylib:dummy-so.dylib
 \
+// RUN: %run ./a.out 2>&1 | FileCheck %s
+// RUN: popd

DanBlackwell wrote:

NIT: I'm missing context as to why this was done the way it was in the original 
code, but it seems the popd-pushd here are redundant.

https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread via llvm-branch-commits

github-actions[bot] wrote:


# :penguin: Linux x64 Test Results

* 5820 tests passed
* 1319 tests skipped

https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/168290
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread via llvm-branch-commits

github-actions[bot] wrote:


# :penguin: Linux x64 Test Results

* 186276 tests passed
* 4848 tests skipped

https://github.com/llvm/llvm-project/pull/168411
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Paschalis Mpeis via llvm-branch-commits
Gergely =?utf-8?q?B=C3=A1lint?= ,
Gergely =?utf-8?q?B=C3=A1lint?= ,Gergely Balint
 ,Gergely Balint ,Gergely
 Balint 
Message-ID:
In-Reply-To: 


https://github.com/paschalis-mpeis approved this pull request.


https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter (#166329) (PR #168380)

2025-11-19 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/168380

Backport 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868

Requested by: @davemgreen

>From 7c585c9c8b7fb78d8107912de47bbd35e8379f7c Mon Sep 17 00:00:00 2001
From: David Green 
Date: Wed, 12 Nov 2025 16:26:21 +
Subject: [PATCH] [ARM] Use TargetMachine over Subtarget in ARMAsmPrinter
 (#166329)

The subtarget may not be set if no functions are present in the module.
Attempt to use the TargetMachine directly in more cases.

Fixes #165422
Fixes #167577

(cherry picked from commit 4d1f2492d26f8c2fad0eee2a141c7e0bbbc4c868)
---
 llvm/lib/Target/ARM/ARMAsmPrinter.cpp | 21 +++--
 llvm/lib/Target/ARM/ARMSubtarget.cpp  | 12 +---
 llvm/lib/Target/ARM/ARMTargetMachine.h| 14 ++
 llvm/test/CodeGen/ARM/xxstructor-nodef.ll |  7 +++
 4 files changed, 33 insertions(+), 21 deletions(-)
 create mode 100644 llvm/test/CodeGen/ARM/xxstructor-nodef.ll

diff --git a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp 
b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
index 850b00406f09e..aa6ef55dad26c 100644
--- a/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
+++ b/llvm/lib/Target/ARM/ARMAsmPrinter.cpp
@@ -97,7 +97,8 @@ void ARMAsmPrinter::emitXXStructor(const DataLayout &DL, 
const Constant *CV) {
 
   const MCExpr *E = MCSymbolRefExpr::create(
   GetARMGVSymbol(GV, ARMII::MO_NO_FLAG),
-  (Subtarget->isTargetELF() ? ARM::S_TARGET1 : ARM::S_None), OutContext);
+  (TM.getTargetTriple().isOSBinFormatELF() ? ARM::S_TARGET1 : ARM::S_None),
+  OutContext);
 
   OutStreamer->emitValue(E, Size);
 }
@@ -595,8 +596,7 @@ void ARMAsmPrinter::emitEndOfAsmFile(Module &M) {
   ARMTargetStreamer &ATS = static_cast(TS);
 
   if (OptimizationGoals > 0 &&
-  (Subtarget->isTargetAEABI() || Subtarget->isTargetGNUAEABI() ||
-   Subtarget->isTargetMuslAEABI()))
+  (TT.isTargetAEABI() || TT.isTargetGNUAEABI() || TT.isTargetMuslAEABI()))
 ATS.emitAttribute(ARMBuildAttrs::ABI_optimization_goals, 
OptimizationGoals);
   OptimizationGoals = -1;
 
@@ -866,9 +866,10 @@ static uint8_t getModifierSpecifier(ARMCP::ARMCPModifier 
Modifier) {
 
 MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue *GV,
 unsigned char TargetFlags) {
-  if (Subtarget->isTargetMachO()) {
+  const Triple &TT = TM.getTargetTriple();
+  if (TT.isOSBinFormatMachO()) {
 bool IsIndirect =
-(TargetFlags & ARMII::MO_NONLAZY) && Subtarget->isGVIndirectSymbol(GV);
+(TargetFlags & ARMII::MO_NONLAZY) && getTM().isGVIndirectSymbol(GV);
 
 if (!IsIndirect)
   return getSymbol(GV);
@@ -885,9 +886,8 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue 
*GV,
   StubSym = MachineModuleInfoImpl::StubValueTy(getSymbol(GV),
!GV->hasInternalLinkage());
 return MCSym;
-  } else if (Subtarget->isTargetCOFF()) {
-assert(Subtarget->isTargetWindows() &&
-   "Windows is the only supported COFF target");
+  } else if (TT.isOSBinFormatCOFF()) {
+assert(TT.isOSWindows() && "Windows is the only supported COFF target");
 
 bool IsIndirect =
 (TargetFlags & (ARMII::MO_DLLIMPORT | ARMII::MO_COFFSTUB));
@@ -914,7 +914,7 @@ MCSymbol *ARMAsmPrinter::GetARMGVSymbol(const GlobalValue 
*GV,
 }
 
 return MCSym;
-  } else if (Subtarget->isTargetELF()) {
+  } else if (TT.isOSBinFormatELF()) {
 return getSymbolPreferLocal(*GV);
   }
   llvm_unreachable("unexpected target");
@@ -960,7 +960,8 @@ void ARMAsmPrinter::emitMachineConstantPoolValue(
 
 // On Darwin, const-pool entries may get the "FOO$non_lazy_ptr" mangling, 
so
 // flag the global as MO_NONLAZY.
-unsigned char TF = Subtarget->isTargetMachO() ? ARMII::MO_NONLAZY : 0;
+unsigned char TF =
+TM.getTargetTriple().isOSBinFormatMachO() ? ARMII::MO_NONLAZY : 0;
 MCSym = GetARMGVSymbol(GV, TF);
   } else if (ACPV->isMachineBasicBlock()) {
 const MachineBasicBlock *MBB = cast(ACPV)->getMBB();
diff --git a/llvm/lib/Target/ARM/ARMSubtarget.cpp 
b/llvm/lib/Target/ARM/ARMSubtarget.cpp
index 13185a7d797a3..63d6e2ea7389b 100644
--- a/llvm/lib/Target/ARM/ARMSubtarget.cpp
+++ b/llvm/lib/Target/ARM/ARMSubtarget.cpp
@@ -316,17 +316,7 @@ bool ARMSubtarget::isRWPI() const {
 }
 
 bool ARMSubtarget::isGVIndirectSymbol(const GlobalValue *GV) const {
-  if (!TM.shouldAssumeDSOLocal(GV))
-return true;
-
-  // 32 bit macho has no relocation for a-b if a is undefined, even if b is in
-  // the section that is being relocated. This means we have to use o load even
-  // for GVs that are known to be local to the dso.
-  if (isTargetMachO() && TM.isPositionIndependent() &&
-  (GV->isDeclarationForLinker() || GV->hasCommonLinkage()))
-return true;
-
-  return false;
+  return TM.isGVIndirectSymbol(GV);
 }
 
 bool ARMSubtarget::isGVInGOT(const GlobalValue *GV) const {
diff --git a/llvm/lib/Target/ARM/ARMTargetMachine.h 
b/llvm/lib/Target/ARM/ARMTa

[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)

2025-11-19 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/168058

>From 5700ad0a2fb2a859e7c46c6690854c35206155f0 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Nov 2025 10:05:14 +0100
Subject: [PATCH 1/2] nit


>From e060c5eba50d75216d628e16da72929b71aa9a30 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH 2/2] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
 Classes

+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.

With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
 1 file changed, 38 insertions(+), 120 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 49aba39872138..bf04c7fa132c0 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
 
 /// Generates code sequences for the memory model of all GFX targets below
 /// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
 public:
 
   SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
  Position Pos) const override;
 };
 
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
 public:
-  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+  SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
 
   bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
  SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace) const override;
 
+  bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+  SIAtomicScope Scope,
+  SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
+  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
   bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
   SIAtomicAddrSpace AddrSpace, SIMemOp Op,
   bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
 
   bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
-  SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
 
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-  SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-  bool IsVolatile, bool IsNonTemporal,
-  bool IsLastUse) const override;
+  bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool 
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+  IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+  /*AtomicsOnly=*/false);
+  }
 };
 
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
 protected:
   // Sets TH policy to \p Value if CPol operand is present in instruction \p 
MI.
   // \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
 
 public:
-  SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+  SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
 // GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
 // the behavior is the same if assuming GFX12.0 in CU mode.
 assert(!ST.hasGFX1250Insts() || ST.isCuMode

[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread Claire Fan via llvm-branch-commits

https://github.com/clairechingching created 
https://github.com/llvm/llvm-project/pull/168314

I'd like to backport this change to handle misaligned memory access in the BPF 
target which was merged in [this original 
PR](https://github.com/llvm/llvm-project/pull/167013). Backporting it so I can 
enable this feature in the rust nightly computer

>From 5d2ec95c53bd510a39fd33ab234a961c91b69cd0 Mon Sep 17 00:00:00 2001
From: Claire xyz 
Date: Fri, 7 Nov 2025 11:08:47 -0500
Subject: [PATCH] [BPF] add allows-misaligned-mem-access target feature

This enables misaligned memory access when the feature is enabled
---
 llvm/lib/Target/BPF/BPF.td|   4 +
 llvm/lib/Target/BPF/BPFISelLowering.cpp   |  20 ++
 llvm/lib/Target/BPF/BPFISelLowering.h |   7 +
 llvm/lib/Target/BPF/BPFSubtarget.cpp  |   1 +
 llvm/lib/Target/BPF/BPFSubtarget.h|   6 +
 llvm/test/CodeGen/BPF/unaligned_load_store.ll | 196 ++
 6 files changed, 234 insertions(+)
 create mode 100644 llvm/test/CodeGen/BPF/unaligned_load_store.ll

diff --git a/llvm/lib/Target/BPF/BPF.td b/llvm/lib/Target/BPF/BPF.td
index dff76ca07af51..a7aa6274f5ac1 100644
--- a/llvm/lib/Target/BPF/BPF.td
+++ b/llvm/lib/Target/BPF/BPF.td
@@ -27,6 +27,10 @@ def ALU32 : SubtargetFeature<"alu32", "HasAlu32", "true",
 def DwarfRIS: SubtargetFeature<"dwarfris", "UseDwarfRIS", "true",
"Disable MCAsmInfo 
DwarfUsesRelocationsAcrossSections">;
 
+def MisalignedMemAccess : SubtargetFeature<"allows-misaligned-mem-access",
+   "AllowsMisalignedMemAccess", "true",
+   "Allows misaligned memory access">;
+
 def : Proc<"generic", []>;
 def : Proc<"v1", []>;
 def : Proc<"v2", []>;
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp 
b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index f4f414d192df0..5ec7f5905fd22 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -196,6 +196,26 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine 
&TM,
   HasJmp32 = STI.getHasJmp32();
   HasJmpExt = STI.getHasJmpExt();
   HasMovsx = STI.hasMovsx();
+
+  AllowsMisalignedMemAccess = STI.getAllowsMisalignedMemAccess();
+}
+
+bool BPFTargetLowering::allowsMisalignedMemoryAccesses(EVT VT, unsigned, Align,
+   
MachineMemOperand::Flags,
+   unsigned *Fast) const {
+  // allows-misaligned-mem-access is disabled
+  if (!AllowsMisalignedMemAccess)
+return false;
+
+  // only allow misalignment for simple value types
+  if (!VT.isSimple())
+return false;
+
+  // always assume fast mode when misalignment is allowed
+  if (Fast)
+*Fast = true;
+
+  return true;
 }
 
 bool BPFTargetLowering::isOffsetFoldingLegal(const GlobalAddressSDNode *GA) 
const {
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.h 
b/llvm/lib/Target/BPF/BPFISelLowering.h
index 8f60261c10e9e..fe01bd5b8cf85 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -46,6 +46,10 @@ class BPFTargetLowering : public TargetLowering {
   // with the given GlobalAddress is legal.
   bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
 
+  bool allowsMisalignedMemoryAccesses(EVT VT, unsigned, Align,
+  MachineMemOperand::Flags,
+  unsigned *) const override;
+
   BPFTargetLowering::ConstraintType
   getConstraintType(StringRef Constraint) const override;
 
@@ -73,6 +77,9 @@ class BPFTargetLowering : public TargetLowering {
   bool HasJmpExt;
   bool HasMovsx;
 
+  // Allows Misalignment
+  bool AllowsMisalignedMemAccess;
+
   SDValue LowerSDIVSREM(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp 
b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 4167547680b12..925537710efb0 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -66,6 +66,7 @@ void BPFSubtarget::initializeEnvironment() {
   HasGotol = false;
   HasStoreImm = false;
   HasLoadAcqStoreRel = false;
+  AllowsMisalignedMemAccess = false;
 }
 
 void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) {
diff --git a/llvm/lib/Target/BPF/BPFSubtarget.h 
b/llvm/lib/Target/BPF/BPFSubtarget.h
index aed2211265e23..a9a20008733c9 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.h
+++ b/llvm/lib/Target/BPF/BPFSubtarget.h
@@ -63,6 +63,9 @@ class BPFSubtarget : public BPFGenSubtargetInfo {
   // whether we should enable MCAsmInfo DwarfUsesRelocationsAcrossSections
   bool UseDwarfRIS;
 
+  // whether we allows misaligned memory access
+  bool AllowsMisalignedMemAccess;
+
   // whether cpu v4 insns are enabled.
   bool HasLdsx, HasMo

[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

https://github.com/bgergely0 edited 
https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint with fp->int->fp optimizations (PR #164503)

2025-11-19 Thread Guy David via llvm-branch-commits


@@ -6075,6 +6075,35 @@ bool SelectionDAG::isKnownNeverZeroFloat(SDValue Op) 
const {
   Op, [](ConstantFPSDNode *C) { return !C->isZero(); });
 }
 
+bool SelectionDAG::allUsesSignedZeroInsensitive(SDValue Op) const {
+  assert(Op.getValueType().isFloatingPoint());
+  return all_of(Op->uses(), [&](SDUse &Use) {

guy-david wrote:

Sounds good, limiting it to two uses for now. I will look into implementing it 
via demanded-bits in the near future.
Moving the SelectionDAG patch to 
https://github.com/llvm/llvm-project/pull/165011 because I don't want it to be 
tightly coupled to the fp-to-int-to-fp optimization.

https://github.com/llvm/llvm-project/pull/164503
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)

2025-11-19 Thread via llvm-branch-commits

dyung wrote:

Hi, at this point in the 21.x release branch we are only accepting patches that 
fix regressions or major issues. Was the problem being fixed here a recent 
regression? From a quick look at the history, the code being replaced was 
introduced around the LLVM 18 time frame, so it has been around for a while.

What are the implications if we do not accept this change into the 21.x release 
branch? Would something be broken that cannot be worked around or otherwise 
fixed without it? At this point, I am leaning towards not including the fix and 
waiting for LLVM 22 for it, but if you feel strongly that it should be 
included, please let us know why and I can consult with the other release 
managers to see how they feel on the issue.

https://github.com/llvm/llvm-project/pull/168371
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][SME] Handle zeroing ZA and ZT0 in functions with ZT0 state (PR #166361)

2025-11-19 Thread Sander de Smalen via llvm-branch-commits


@@ -356,20 +356,13 @@ define void @new_za_zt0_caller(ptr %callee) 
"aarch64_new_za" "aarch64_new_zt0" n
 
 ; Expect clear ZA on entry
 define void @new_za_shared_zt0_caller(ptr %callee) "aarch64_new_za" 
"aarch64_in_zt0" nounwind {
-; CHECK-LABEL: new_za_shared_zt0_caller:
-; CHECK:   // %bb.0:
-; CHECK-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEXT:zero {za}
-; CHECK-NEXT:blr x0
-; CHECK-NEXT:ldr x30, [sp], #16 // 8-byte Folded Reload
-; CHECK-NEXT:ret
-;
-; CHECK-NEWLOWERING-LABEL: new_za_shared_zt0_caller:
-; CHECK-NEWLOWERING:   // %bb.0:
-; CHECK-NEWLOWERING-NEXT:str x30, [sp, #-16]! // 8-byte Folded Spill
-; CHECK-NEWLOWERING-NEXT:blr x0

sdesmalen-arm wrote:

Why wasn't ZA zeroed before?

https://github.com/llvm/llvm-project/pull/166361
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic ready_for_review 
https://github.com/llvm/llvm-project/pull/168411
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)

2025-11-19 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/168058

>From f0a60702ef1dba4a3545848ff4791fceda7abc51 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
 Classes

+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.

With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
 1 file changed, 38 insertions(+), 120 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 8d27084cf72d9..eddd4a3bafe2e 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
 
 /// Generates code sequences for the memory model of all GFX targets below
 /// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
 public:
 
   SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
  Position Pos) const override;
 };
 
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
 public:
-  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+  SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
 
   bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
  SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace) const override;
 
+  bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+  SIAtomicScope Scope,
+  SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
+  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
   bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
   SIAtomicAddrSpace AddrSpace, SIMemOp Op,
   bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
 
   bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
-  SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
 
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-  SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-  bool IsVolatile, bool IsNonTemporal,
-  bool IsLastUse) const override;
+  bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool 
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+  IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+  /*AtomicsOnly=*/false);
+  }
 };
 
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
 protected:
   // Sets TH policy to \p Value if CPol operand is present in instruction \p 
MI.
   // \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
 
 public:
-  SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+  SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
 // GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
 // the behavior is the same if assuming GFX12.0 in CU mode.
 assert(!ST.hasGFX1250Insts() || ST.isCuModeEnabled());
@@ -915,10 +922,8 @@ std::unique_ptr 
SICacheControl::create(const GCNSubtarget &ST) {
   GCNSubtarget::Generation Generation = ST.getGeneration(

[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();
+  auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+  if (Probes.empty())
+return nullptr;
+  const MCDecodedPseudoProbe &Probe = *Probes.begin();
+  const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+  while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+  return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+  using namespace yaml::bolt;
+  BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+  const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+  using ChildMapTy =
+  std::unordered_map;
+  using CallSiteInfoTy =
+  std::unordered_map;
+  // Mapping from a parent node id to a map InlineSite -> Child node.
+  DenseMap ParentToChildren;
+  // Collect calls in the profile: map from a parent node id to a map
+  // InlineSite -> CallSiteInfo ptr.
+  DenseMap ParentToCSI;
+  for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+  ProbeMatchingStats.TotalCallCount += CSI.Count;
+  ++ProbeMatchingStats.TotalCallSites;
+  if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count

maksfb wrote:

```suggestion
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: no probe for " << CSI.DestId << " " 
<< CSI.Count
```

https://github.com/llvm/llvm-project/pull/100446
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT] Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

https://github.com/bgergely0 edited 
https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT]Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

https://github.com/bgergely0 edited 
https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)

2025-11-19 Thread Guy David via llvm-branch-commits

https://github.com/guy-david updated 
https://github.com/llvm/llvm-project/pull/165011

>From 01e872d95c1708392ae429879f36f6a32ca4889a Mon Sep 17 00:00:00 2001
From: Guy David 
Date: Fri, 24 Oct 2025 19:30:19 +0300
Subject: [PATCH] [DAGCombiner] Relax nsz constraint for FP optimizations

Some floating-point optimization don't trigger because they can produce
incorrect results around signed zeros, and rely on the existence of the
nsz flag which commonly appears when fast-math is enabled.
However, this flag is not a hard requirement when all of the users of
the combined value are either guaranteed to overwrite the sign-bit or
simply ignore it (comparisons, etc.).

The optimizations affected:
- fadd x, +0.0 -> x
- fsub x, -0.0 -> x
- fsub +0.0, x -> fneg x
- fdiv(x, sqrt(x)) -> sqrt(x)
- frem lowering with power-of-2 divisors
---
 llvm/include/llvm/CodeGen/SelectionDAG.h  |  6 ++
 llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 17 +++--
 .../lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 40 +++
 .../CodeGen/AArch64/ignore-signed-zero.ll | 72 +++
 .../AMDGPU/fcanonicalize-elimination.ll   |  2 +-
 llvm/test/CodeGen/AMDGPU/swdev380865.ll   |  5 +-
 6 files changed, 132 insertions(+), 10 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/ignore-signed-zero.ll

diff --git a/llvm/include/llvm/CodeGen/SelectionDAG.h 
b/llvm/include/llvm/CodeGen/SelectionDAG.h
index b024e8a68bd6e..9dba2ee8692f5 100644
--- a/llvm/include/llvm/CodeGen/SelectionDAG.h
+++ b/llvm/include/llvm/CodeGen/SelectionDAG.h
@@ -2326,6 +2326,12 @@ class SelectionDAG {
   /// +nan are considered positive, -0.0, -inf and -nan are not.
   LLVM_ABI bool cannotBeOrderedNegativeFP(SDValue Op) const;
 
+  /// Check if a use of a float value is insensitive to signed zeros.
+  LLVM_ABI bool canIgnoreSignBitOfZero(const SDUse &Use) const;
+
+  /// Check if at most two uses of a value are insensitive to signed zeros.
+  LLVM_ABI bool canIgnoreSignBitOfZero(SDValue Op) const;
+
   /// Test whether two SDValues are known to compare equal. This
   /// is true if they are the same value, or if one is negative zero and the
   /// other positive zero.
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index c9513611e6dcb..3624748a3b0f0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -17869,7 +17869,8 @@ SDValue DAGCombiner::visitFADD(SDNode *N) {
   // N0 + -0.0 --> N0 (also allowed with +0.0 and fast-math)
   ConstantFPSDNode *N1C = isConstOrConstSplatFP(N1, true);
   if (N1C && N1C->isZero())
-if (N1C->isNegative() || Flags.hasNoSignedZeros())
+if (N1C->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0)))
   return N0;
 
   if (SDValue NewSel = foldBinOpIntoSelect(N))
@@ -18081,7 +18082,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub A, 0) -> A
   if (N1CFP && N1CFP->isZero()) {
-if (!N1CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (!N1CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0))) {
   return N0;
 }
   }
@@ -18094,7 +18096,8 @@ SDValue DAGCombiner::visitFSUB(SDNode *N) {
 
   // (fsub -0.0, N1) -> -N1
   if (N0CFP && N0CFP->isZero()) {
-if (N0CFP->isNegative() || Flags.hasNoSignedZeros()) {
+if (N0CFP->isNegative() || Flags.hasNoSignedZeros() ||
+DAG.canIgnoreSignBitOfZero(SDValue(N, 0))) {
   // We cannot replace an FSUB(+-0.0,X) with FNEG(X) when denormals are
   // flushed to zero, unless all users treat denorms as zero (DAZ).
   // FIXME: This transform will change the sign of a NaN and the behavior
@@ -18744,7 +18747,8 @@ SDValue DAGCombiner::visitFDIV(SDNode *N) {
   }
 
   // Fold X/Sqrt(X) -> Sqrt(X)
-  if (Flags.hasNoSignedZeros() && Flags.hasAllowReassociation())
+  if ((Flags.hasNoSignedZeros() || DAG.canIgnoreSignBitOfZero(SDValue(N, 0))) 
&&
+  Flags.hasAllowReassociation())
 if (N1.getOpcode() == ISD::FSQRT && N0 == N1.getOperand(0))
   return N1;
 
@@ -18795,8 +18799,9 @@ SDValue DAGCombiner::visitFREM(SDNode *N) {
   TLI.isOperationLegalOrCustom(ISD::FDIV, VT) &&
   TLI.isOperationLegalOrCustom(ISD::FTRUNC, VT) &&
   DAG.isKnownToBeAPowerOfTwoFP(N1)) {
-bool NeedsCopySign =
-!Flags.hasNoSignedZeros() && !DAG.cannotBeOrderedNegativeFP(N0);
+bool NeedsCopySign = !Flags.hasNoSignedZeros() &&
+ !DAG.cannotBeOrderedNegativeFP(N0) &&
+ !DAG.canIgnoreSignBitOfZero(SDValue(N, 0));
 SDValue Div = DAG.getNode(ISD::FDIV, DL, VT, N0, N1);
 SDValue Rnd = DAG.getNode(ISD::FTRUNC, DL, VT, Div);
 SDValue MLA;
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c2b4c19846316..64fd925684ffa 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/

[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread Claire Fan via llvm-branch-commits

https://github.com/clairechingching edited 
https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang-tools-extra] [flang] [libcxx] [lldb] [llvm] [mlir] [DAGCombiner] Relax nsz constraint for more FP optimizations (PR #165011)

2025-11-19 Thread Guy David via llvm-branch-commits

https://github.com/guy-david updated 
https://github.com/llvm/llvm-project/pull/165011

error: too big or took too long to generate
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl Classes (PR #168058)

2025-11-19 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/168058

>From 5700ad0a2fb2a859e7c46c6690854c35206155f0 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Mon, 17 Nov 2025 10:05:14 +0100
Subject: [PATCH 1/2] nit


>From e060c5eba50d75216d628e16da72929b71aa9a30 Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Nov 2025 14:29:11 +0100
Subject: [PATCH 2/2] [AMDGPU][SIMemoryLegalizer] Combine GFX10-11 CacheControl
 Classes

+ Break the long inheritance chains by making both `SIGfx10CacheControl` and
`SIGfx12CacheControl` inherit from `SICacheControl`.

With this patch and the previous one, there is no more long inheritance chain in
`SIMemoryLegalizer`. We just have 3 `SICacheControl` implementations that each
do their own thing, and there is no more code hidden 3 superclasses above.
All implementations are marked `final` too.
---
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp | 158 +--
 1 file changed, 38 insertions(+), 120 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 49aba39872138..bf04c7fa132c0 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -404,7 +404,7 @@ class SICacheControl {
 
 /// Generates code sequences for the memory model of all GFX targets below
 /// GFX10.
-class SIGfx6CacheControl : public SICacheControl {
+class SIGfx6CacheControl final : public SICacheControl {
 public:
 
   SIGfx6CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
@@ -443,14 +443,27 @@ class SIGfx6CacheControl : public SICacheControl {
  Position Pos) const override;
 };
 
-class SIGfx10CacheControl : public SIGfx6CacheControl {
+/// Generates code sequences for the memory model of GFX10/11.
+class SIGfx10CacheControl final : public SICacheControl {
 public:
-  SIGfx10CacheControl(const GCNSubtarget &ST) : SIGfx6CacheControl(ST) {}
+  SIGfx10CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {}
 
   bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
  SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace) const override;
 
+  bool enableStoreCacheBypass(const MachineBasicBlock::iterator &MI,
+  SIAtomicScope Scope,
+  SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
+  bool enableRMWCacheBypass(const MachineBasicBlock::iterator &MI,
+SIAtomicScope Scope,
+SIAtomicAddrSpace AddrSpace) const override {
+return false;
+  }
+
   bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
   SIAtomicAddrSpace AddrSpace, SIMemOp Op,
   bool IsVolatile, bool IsNonTemporal,
@@ -463,23 +476,17 @@ class SIGfx10CacheControl : public SIGfx6CacheControl {
 
   bool insertAcquire(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
  SIAtomicAddrSpace AddrSpace, Position Pos) const override;
-};
-
-class SIGfx11CacheControl : public SIGfx10CacheControl {
-public:
-  SIGfx11CacheControl(const GCNSubtarget &ST) : SIGfx10CacheControl(ST) {}
 
-  bool enableLoadCacheBypass(const MachineBasicBlock::iterator &MI,
- SIAtomicScope Scope,
- SIAtomicAddrSpace AddrSpace) const override;
-
-  bool enableVolatileAndOrNonTemporal(MachineBasicBlock::iterator &MI,
-  SIAtomicAddrSpace AddrSpace, SIMemOp Op,
-  bool IsVolatile, bool IsNonTemporal,
-  bool IsLastUse) const override;
+  bool insertRelease(MachineBasicBlock::iterator &MI, SIAtomicScope Scope,
+ SIAtomicAddrSpace AddrSpace, bool 
IsCrossAddrSpaceOrdering,
+ Position Pos) const override {
+return insertWait(MI, Scope, AddrSpace, SIMemOp::LOAD | SIMemOp::STORE,
+  IsCrossAddrSpaceOrdering, Pos, AtomicOrdering::Release,
+  /*AtomicsOnly=*/false);
+  }
 };
 
-class SIGfx12CacheControl : public SIGfx11CacheControl {
+class SIGfx12CacheControl final : public SICacheControl {
 protected:
   // Sets TH policy to \p Value if CPol operand is present in instruction \p 
MI.
   // \returns Returns true if \p MI is modified, false otherwise.
@@ -504,7 +511,7 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   SIAtomicScope Scope, SIAtomicAddrSpace AddrSpace) const;
 
 public:
-  SIGfx12CacheControl(const GCNSubtarget &ST) : SIGfx11CacheControl(ST) {
+  SIGfx12CacheControl(const GCNSubtarget &ST) : SICacheControl(ST) {
 // GFX12.0 and GFX12.5 memory models greatly overlap, and in some cases
 // the behavior is the same if assuming GFX12.0 in CU mode.
 assert(!ST.hasGFX1250Insts() || ST.isCuMode

[llvm-branch-commits] [llvm] [BOLT] Match functions with pseudo probes (PR #100446)

2025-11-19 Thread Maksim Panchenko via llvm-branch-commits


@@ -592,72 +633,276 @@ size_t 
YAMLProfileReader::matchWithCallGraph(BinaryContext &BC) {
   return MatchedWithCallGraph;
 }
 
-size_t YAMLProfileReader::InlineTreeNodeMapTy::matchInlineTrees(
-const MCPseudoProbeDecoder &Decoder,
-const std::vector &DecodedInlineTree,
-const MCDecodedPseudoProbeInlineTree *Root) {
-  // Match inline tree nodes by GUID, checksum, parent, and call site.
-  for (const auto &[InlineTreeNodeId, InlineTreeNode] :
-   llvm::enumerate(DecodedInlineTree)) {
-uint64_t GUID = InlineTreeNode.GUID;
-uint64_t Hash = InlineTreeNode.Hash;
-uint32_t ParentId = InlineTreeNode.ParentIndexDelta;
-uint32_t CallSiteProbe = InlineTreeNode.CallSiteProbe;
-const MCDecodedPseudoProbeInlineTree *Cur = nullptr;
-if (!InlineTreeNodeId) {
-  Cur = Root;
-} else if (const MCDecodedPseudoProbeInlineTree *Parent =
-   getInlineTreeNode(ParentId)) {
-  for (const MCDecodedPseudoProbeInlineTree &Child :
-   Parent->getChildren()) {
-if (Child.Guid == GUID) {
-  if (std::get<1>(Child.getInlineSite()) == CallSiteProbe)
-Cur = &Child;
-  break;
-}
+const MCDecodedPseudoProbeInlineTree *
+YAMLProfileReader::lookupTopLevelNode(const BinaryFunction &BF) {
+  const BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder *Decoder = BC.getPseudoProbeDecoder();
+  assert(Decoder &&
+ "If pseudo probes are in use, pseudo probe decoder should exist");
+  uint64_t Addr = BF.getAddress();
+  uint64_t Size = BF.getSize();
+  auto Probes = Decoder->getAddress2ProbesMap().find(Addr, Addr + Size);
+  if (Probes.empty())
+return nullptr;
+  const MCDecodedPseudoProbe &Probe = *Probes.begin();
+  const MCDecodedPseudoProbeInlineTree *Root = Probe.getInlineTreeNode();
+  while (Root->hasInlineSite())
+Root = (const MCDecodedPseudoProbeInlineTree *)Root->Parent;
+  return Root;
+}
+
+size_t YAMLProfileReader::matchInlineTreesImpl(
+BinaryFunction &BF, yaml::bolt::BinaryFunctionProfile &YamlBF,
+const MCDecodedPseudoProbeInlineTree &Root, uint32_t RootIdx,
+ArrayRef ProfileInlineTree,
+MutableArrayRef Map, float Scale) {
+  using namespace yaml::bolt;
+  BinaryContext &BC = BF.getBinaryContext();
+  const MCPseudoProbeDecoder &Decoder = *BC.getPseudoProbeDecoder();
+  const InlineTreeNode &FuncNode = ProfileInlineTree[RootIdx];
+
+  using ChildMapTy =
+  std::unordered_map;
+  using CallSiteInfoTy =
+  std::unordered_map;
+  // Mapping from a parent node id to a map InlineSite -> Child node.
+  DenseMap ParentToChildren;
+  // Collect calls in the profile: map from a parent node id to a map
+  // InlineSite -> CallSiteInfo ptr.
+  DenseMap ParentToCSI;
+  for (const BinaryBasicBlockProfile &YamlBB : YamlBF.Blocks) {
+// Collect callees for inlined profile matching, indexed by InlineSite.
+for (const CallSiteInfo &CSI : YamlBB.CallSites) {
+  ProbeMatchingStats.TotalCallCount += CSI.Count;
+  ++ProbeMatchingStats.TotalCallSites;
+  if (CSI.Probe == 0) {
+LLVM_DEBUG(dbgs() << "no probe for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallProbe;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  const BinaryFunctionProfile *Callee = IdToYamLBF.lookup(CSI.DestId);
+  if (!Callee) {
+LLVM_DEBUG(dbgs() << "no callee for " << CSI.DestId << " " << CSI.Count
+  << '\n');
+++ProbeMatchingStats.MissingCallee;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  // Get callee GUID
+  if (Callee->InlineTree.empty()) {
+LLVM_DEBUG(dbgs() << "no inline tree for " << Callee->Name << '\n');
+++ProbeMatchingStats.MissingInlineTree;
+ProbeMatchingStats.MissingCallCount += CSI.Count;
+continue;
+  }
+  uint64_t CalleeGUID = Callee->InlineTree.front().GUID;
+  ParentToCSI[CSI.InlineTreeNode][InlineSite(CalleeGUID, CSI.Probe)] = 
&CSI;
+}
+  }
+  LLVM_DEBUG({
+for (auto &[ParentId, InlineSiteCSI] : ParentToCSI) {
+  for (auto &[InlineSite, CSI] : InlineSiteCSI) {
+auto [CalleeGUID, CallSite] = InlineSite;
+errs() << ParentId << "@" << CallSite << "->"
+   << Twine::utohexstr(CalleeGUID) << ": " << CSI->Count << ", "
+   << Twine::utohexstr(CSI->Offset) << '\n';
+  }
+}
+  });
+
+  assert(!Root.isRoot());
+  LLVM_DEBUG(dbgs() << "matchInlineTreesImpl for " << BF << "@"
+<< Twine::utohexstr(Root.Guid) << " and " << YamlBF.Name
+<< "@" << Twine::utohexstr(FuncNode.GUID) << '\n');
+  ++ProbeMatchingStats.AttemptedNodes;
+  ++ProbeMatchingStats.AttemptedRoots;
+
+  // Match profile function with a lead node (top-level function or inlinee)
+  if (Root.Guid != FuncNode.GUID) {
+LLVM_DEBUG(dbgs() << "

[llvm-branch-commits] [lld] release/21.x: [LLD][COFF] Align EC code ranges to page boundaries (#168222) (PR #168369)

2025-11-19 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/168369

Backport af45b0202cdd443beedb02392f653d8cff5bd931

Requested by: @cjacek

>From fb641d8e566da6cf431398e85faa1254914751ed Mon Sep 17 00:00:00 2001
From: Jacek Caban 
Date: Mon, 17 Nov 2025 12:44:22 +0100
Subject: [PATCH] [LLD][COFF] Align EC code ranges to page boundaries (#168222)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

We already ensure that code for different architectures is always placed
in different pages in `assignAddresses`. We represent those ranges using
their first and last chunks. However, the RVAs of those chunks may not
be page-aligned, for example, due to extra padding for entry-thunk
offsets. Align the chunk RVAs to the page boundary so that the emitted
ranges correctly include the entire region.

This change affects an existing test that checks corner cases triggered
by merging a data section into a code section. We may now include such
data in the code range. This differs from MSVC’s behavior, but it should
not cause practical issues, and the new behavior is arguably more
correct.

Fixes #168119.

(cherry picked from commit af45b0202cdd443beedb02392f653d8cff5bd931)
---
 lld/COFF/Chunks.cpp|  2 +-
 lld/test/COFF/arm64ec-codemap.test | 36 +++---
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index 01752cdc6a9da..cfb33daa024a7 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -939,7 +939,7 @@ void ECCodeMapChunk::writeTo(uint8_t *buf) const {
   auto table = reinterpret_cast(buf);
   for (uint32_t i = 0; i < map.size(); i++) {
 const ECCodeMapEntry &entry = map[i];
-uint32_t start = entry.first->getRVA();
+uint32_t start = entry.first->getRVA() & ~0xfff;
 table[i].StartOffset = start | entry.type;
 table[i].Length = entry.last->getRVA() + entry.last->getSize() - start;
   }
diff --git a/lld/test/COFF/arm64ec-codemap.test 
b/lld/test/COFF/arm64ec-codemap.test
index 050261117be2e..bbc682d19920f 100644
--- a/lld/test/COFF/arm64ec-codemap.test
+++ b/lld/test/COFF/arm64ec-codemap.test
@@ -7,6 +7,7 @@ RUN: llvm-mc -filetype=obj -triple=arm64ec-windows 
arm64ec-func-sym2.s -o arm64e
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec.s -o data-sec.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows data-sec2.s -o data-sec2.obj
 RUN: llvm-mc -filetype=obj -triple=arm64ec-windows empty-sec.s -o 
arm64ec-empty-sec.obj
+RUN: llvm-mc -filetype=obj -triple=arm64ec-windows entry-thunk.s -o 
entry-thunk.obj
 RUN: llvm-mc -filetype=obj -triple=x86_64-windows x86_64-func-sym.s -o 
x86_64-func-sym.obj
 RUN: llvm-mc -filetype=obj -triple=x86_64-windows empty-sec.s -o 
x86_64-empty-sec.obj
 RUN: llvm-mc -filetype=obj -triple=aarch64-windows 
%S/Inputs/loadconfig-arm64.s -o loadconfig-arm64.obj
@@ -162,15 +163,17 @@ RUN:  loadconfig-arm64ec.obj -dll -noentry 
-merge:test=.testdata -merge:
 
 RUN: llvm-readobj --coff-load-config testcm.dll | FileCheck 
-check-prefix=CODEMAPCM %s
 CODEMAPCM:  CodeMap [
-CODEMAPCM-NEXT: 0x4008 - 0x4016  X64
+CODEMAPCM-NEXT: 0x4000 - 0x4016  X64
 CODEMAPCM-NEXT: ]
 
 RUN: llvm-objdump -d testcm.dll | FileCheck -check-prefix=DISASMCM %s
 DISASMCM:  Disassembly of section .testdat:
 DISASMCM-EMPTY:
 DISASMCM-NEXT: 000180004000 <.testdat>:
-DISASMCM-NEXT: 180004000: 0001 udf #0x1
-DISASMCM-NEXT: 180004004:  udf #0x0
+DISASMCM-NEXT: 180004000: 01 00addl %eax, (%rax)
+DISASMCM-NEXT: 180004002: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004004: 00 00addb %al, (%rax)
+DISASMCM-NEXT: 180004006: 00 00addb %al, (%rax)
 DISASMCM-NEXT: 180004008: b8 03 00 00 00   movl$0x3, %eax
 DISASMCM-NEXT: 18000400d: c3   retq
 DISASMCM-NEXT: 18000400e: 00 00addb%al, (%rax)
@@ -207,6 +210,14 @@ DISASMMS-NEXT: 000180006000 :
 DISASMMS-NEXT: 180006000: 528000a0 mov w0, #0x5// =5
 DISASMMS-NEXT: 180006004: d65f03c0 ret
 
+Test the code map that includes an ARM64EC function padded by its entry-thunk 
offset.
+
+RUN: lld-link -out:testpad.dll -machine:arm64ec entry-thunk.obj 
loadconfig-arm64ec.obj -dll -noentry -include:func
+RUN: llvm-readobj --coff-load-config testpad.dll | FileCheck 
-check-prefix=CODEMAPPAD %s
+CODEMAPPAD:  CodeMap [
+CODEMAPPAD:0x1000 - 0x1010  ARM64EC
+CODEMAPPAD-NEXT: ]
+
 
 #--- arm64-func-sym.s
 .text
@@ -266,3 +277,22 @@ x86_64_func_sym2:
 .section .empty1, "xr"
 .section .empty2, "xr"
 .section .empty3, "xr"
+
+#--- entry-thunk.s
+.section .text,"xr",discard,func
+.globl func
+.p2align 2, 0x0
+func:
+mov w0, #1
+ret
+
+.section .wowthk$aa,"xr",discard,thunk
+.globl thunk
+.p2align 2
+thunk:
+ret
+
+

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#168411** https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.com/github/pr/llvm/llvm-project/168411?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#168410** https://app.graphite.com/github/pr/llvm/llvm-project/168410?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/168411
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] 6922f8a - Revert "[clang][SourceManager] Use `getFileLoc` when computing `getPresumedLo…"

2025-11-19 Thread via llvm-branch-commits

Author: Aaron Ballman
Date: 2025-11-17T08:44:37-05:00
New Revision: 6922f8a3b0f75be79ae26b8b8831512d8de43b58

URL: 
https://github.com/llvm/llvm-project/commit/6922f8a3b0f75be79ae26b8b8831512d8de43b58
DIFF: 
https://github.com/llvm/llvm-project/commit/6922f8a3b0f75be79ae26b8b8831512d8de43b58.diff

LOG: Revert "[clang][SourceManager] Use `getFileLoc` when computing 
`getPresumedLo…"

This reverts commit 6b464e4ac0b1ce4638c0fa07abcba329119836cb.

Added: 


Modified: 
clang/include/clang/Basic/SourceManager.h
clang/lib/Basic/SourceManager.cpp
clang/test/Analysis/plist-macros-with-expansion.cpp
clang/test/C/C23/n2350.c
clang/test/ExtractAPI/macro_undefined.c
clang/test/FixIt/format.cpp
clang/test/Preprocessor/macro_arg_directive.c
clang/test/Preprocessor/print_line_track.c

Removed: 




diff  --git a/clang/include/clang/Basic/SourceManager.h 
b/clang/include/clang/Basic/SourceManager.h
index f15257a760b8c..bc9e97863556d 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -1464,9 +1464,8 @@ class SourceManager : public 
RefCountedBase {
   /// directives.  This provides a view on the data that a user should see
   /// in diagnostics, for example.
   ///
-  /// If \p Loc is a macro expansion location, the presumed location
-  /// computation uses the spelling location for macro arguments and the
-  /// expansion location for other macro expansions.
+  /// Note that a presumed location is always given as the expansion point of
+  /// an expansion location, not at the spelling location.
   ///
   /// \returns The presumed location of the specified SourceLocation. If the
   /// presumed location cannot be calculated (e.g., because \p Loc is invalid

diff  --git a/clang/lib/Basic/SourceManager.cpp 
b/clang/lib/Basic/SourceManager.cpp
index 767a765ae4261..b6cc6ec9365f5 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -1435,7 +1435,7 @@ PresumedLoc SourceManager::getPresumedLoc(SourceLocation 
Loc,
   if (Loc.isInvalid()) return PresumedLoc();
 
   // Presumed locations are always for expansion points.
-  FileIDAndOffset LocInfo = getDecomposedLoc(getFileLoc(Loc));
+  FileIDAndOffset LocInfo = getDecomposedExpansionLoc(Loc);
 
   bool Invalid = false;
   const SLocEntry &Entry = getSLocEntry(LocInfo.first, &Invalid);

diff  --git a/clang/test/Analysis/plist-macros-with-expansion.cpp 
b/clang/test/Analysis/plist-macros-with-expansion.cpp
index d9a2f94055593..d57bb0f2dd265 100644
--- a/clang/test/Analysis/plist-macros-with-expansion.cpp
+++ b/clang/test/Analysis/plist-macros-with-expansion.cpp
@@ -405,14 +405,14 @@ void commaInBracketsTest() {
   code
 
 void commaInBracesTest() {
-  PASTE_CODE({
+  PASTE_CODE({ // expected-warning{{Dereference of null pointer}}
 // NOTE: If we were to add a new variable here after a comma, we'd get a
 // compilation error, so this test is mainly here to show that this was 
also
 // investigated.
 //
 // int *ptr = nullptr, a;
 int *ptr = nullptr;
-*ptr = 5; // expected-warning{{Dereference of null pointer}}
+*ptr = 5;
   })
 }
 
@@ -425,14 +425,14 @@ void commaInBracesTest() {
 // CHECK-NEXT:  col3
 // CHECK-NEXT:  file0
 // CHECK-NEXT: 
-// CHECK-NEXT: namePASTE_CODE({
+// CHECK-NEXT: namePASTE_CODE({ // expected-
 // CHECK-NEXT:// NOTE: If we were to add a new variable here after a 
comma, we'd get a
 // CHECK-NEXT:// compilation error, so this test is mainly here to show 
that this was also
 // CHECK-NEXT:// investigated.
 // CHECK-NEXT://
 // CHECK-NEXT:// int *ptr = nullptr, a;
 // CHECK-NEXT:int *ptr = nullptr;
-// CHECK-NEXT:*ptr = 5; // expected-
+// CHECK-NEXT:*ptr = 5;
 // CHECK-NEXT:  })
 // CHECK-NEXT: expansion{int *ptr =nullptr ;*ptr 
=5;}
 // CHECK-NEXT:

diff  --git a/clang/test/C/C23/n2350.c b/clang/test/C/C23/n2350.c
index 96b8c511d5716..af0ca6d79be5e 100644
--- a/clang/test/C/C23/n2350.c
+++ b/clang/test/C/C23/n2350.c
@@ -47,10 +47,11 @@ int struct_in_second_param(void) {
 
 int macro(void) {
   return offsetof(struct A // cpp-error {{'A' cannot be defined in a type 
specifier}} \
-  expected-warning {{defining a type within 
'offsetof' is a C23 extension}}
+  expected-warning 2 {{defining a type within 
'offsetof' is a C23 extension}}
   {
 int a;
-struct B // expected-warning {{defining a type within 'offsetof' is a C23 
extension}}
+struct B // verifier seems to think the error is emitted by the macro
+ // In fact the location of the error is "B" on the line above
 {
   int c;
   int d;

diff  --git a/clang/test/ExtractAPI/macro_undefined.c 
b/clang/test/ExtractAPI/macro_undefined.c
index 1d697db1e1613..7bb50af380c24 100644
--- a/clang/test/ExtractAPI/macro_undefined.c
+

[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 updated 
https://github.com/llvm/llvm-project/pull/168545


___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add wave reduce intrinsics for float types - 2 (PR #161815)

2025-11-19 Thread via llvm-branch-commits

easyonaadit wrote:

Ping.

https://github.com/llvm/llvm-project/pull/161815
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 created 
https://github.com/llvm/llvm-project/pull/168545

This patch fixes most of the ASan tests that were failing on Darwin when
running under the internal shell. There are still a couple left that
are more interesting cases that I'll do in a follow up patch. The
tests that still need to be done:
```
TestCases/Darwin/duplicate_os_log_reports.cpp
TestCases/Darwin/dyld_insert_libraries_reexec.cpp
TestCases/Darwin/interface_symbols_darwin.cpp
```



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#168290** https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.com/github/pr/llvm/llvm-project/168290?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#168176** https://app.graphite.com/github/pr/llvm/llvm-project/168176?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/168290
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [TableGen] Strip directories from filename prefixes. (PR #168352)

2025-11-19 Thread Ivan Kosarev via llvm-branch-commits

https://github.com/kosarev created 
https://github.com/llvm/llvm-project/pull/168352

Fixes https://github.com/llvm/llvm-project/pull/167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.

>From af92eaef4e2cc8502d02d104ca44543e169d768e Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Mon, 17 Nov 2025 11:35:13 +
Subject: [PATCH] [TableGen] Strip directories from filename prefixes.

Fixes https://github.com/llvm/llvm-project/pull/167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
---
 llvm/lib/TableGen/Main.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index c3869c3fb9a5a..165c957fc9977 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -167,8 +167,7 @@ int llvm::TableGenMain(const char *argv0,
 
   // Write output to memory.
   Timer.startBackendTimer("Backend overall");
-  SmallString<128> FilenamePrefix(OutputFilename);
-  sys::path::replace_extension(FilenamePrefix, "");
+  SmallString<128> FilenamePrefix(sys::path::stem(OutputFilename));
   TableGenOutputFiles OutFiles;
   unsigned status = 0;
   // ApplyCallback will return true if it did not apply any callback. In that

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make most tests run under internal shell on Darwin (PR #168545)

2025-11-19 Thread Dan Blackwell via llvm-branch-commits

https://github.com/DanBlackwell requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/168545
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread Claire Fan via llvm-branch-commits

clairechingching wrote:

@yonghong-song I'd like to backport this change so that I can enable 
misalignment in the rust nightly compiler, thanks!

https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and G_FNEG (PR #168411)

2025-11-19 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic created 
https://github.com/llvm/llvm-project/pull/168411

None

>From 529b6f23ee1acb393880a336c0fdc89c1792bf1b Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Mon, 17 Nov 2025 18:47:58 +0100
Subject: [PATCH] AMDGPU/GlobalISel: RegBankLegalize rules for G_FABS and
 G_FNEG

---
 .../AMDGPU/AMDGPURegBankLegalizeHelper.cpp|  17 +-
 .../AMDGPU/AMDGPURegBankLegalizeRules.cpp |  19 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll   | 233 ++
 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll   | 216 
 4 files changed, 483 insertions(+), 2 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/GlobalISel/fneg.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
index 1765d054a3c0d..d719f3d40295d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeHelper.cpp
@@ -629,10 +629,23 @@ void RegBankLegalizeHelper::lowerSplitTo32(MachineInstr 
&MI) {
 void RegBankLegalizeHelper::lowerSplitTo16(MachineInstr &MI) {
   Register Dst = MI.getOperand(0).getReg();
   assert(MRI.getType(Dst) == V2S16);
-  auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
-  auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
   unsigned Opc = MI.getOpcode();
   auto Flags = MI.getFlags();
+
+  if (MI.getNumOperands() == 2) {
+auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
+auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
+auto Lo = B.buildInstr(Opc, {SgprRB_S16}, {Op1Lo}, Flags);
+auto Hi = B.buildInstr(Opc, {SgprRB_S16}, {Op1Hi}, Flags);
+B.buildMergeLikeInstr(Dst, {Lo, Hi});
+MI.eraseFromParent();
+return;
+  }
+
+  assert(MI.getNumOperands() == 3);
+  auto [Op1Lo32, Op1Hi32] = unpackAExt(MI.getOperand(1).getReg());
+  auto [Op2Lo32, Op2Hi32] = unpackAExt(MI.getOperand(2).getReg());
   auto Op1Lo = B.buildTrunc(SgprRB_S16, Op1Lo32);
   auto Op1Hi = B.buildTrunc(SgprRB_S16, Op1Hi32);
   auto Op2Lo = B.buildTrunc(SgprRB_S16, Op2Lo32);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
index b81a08de383d9..4051dc8495f6f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegBankLegalizeRules.cpp
@@ -951,6 +951,25 @@ RegBankLegalizeRules::RegBankLegalizeRules(const 
GCNSubtarget &_ST,
   .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32, VgprV2S32}}})
   .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32, VgprV2S32}}});
 
+  // FNEG and FABS are either folded as source modifiers or can be selected as
+  // bitwise XOR and AND with Mask. XOR and AND are available on SALU but for
+  // targets without SALU float we still select them as VGPR since there would
+  // be no real sgpr use.
+  addRulesForGOpcs({G_FNEG, G_FABS}, Standard)
+  .Uni(S16, {{UniInVgprS16}, {Vgpr16}}, !hasSALUFloat)
+  .Uni(S16, {{Sgpr16}, {Sgpr16}}, hasSALUFloat)
+  .Div(S16, {{Vgpr16}, {Vgpr16}})
+  .Uni(S32, {{UniInVgprS32}, {Vgpr32}}, !hasSALUFloat)
+  .Uni(S32, {{Sgpr32}, {Sgpr32}}, hasSALUFloat)
+  .Div(S32, {{Vgpr32}, {Vgpr32}})
+  .Uni(S64, {{UniInVgprS64}, {Vgpr64}})
+  .Div(S64, {{Vgpr64}, {Vgpr64}})
+  .Uni(V2S16, {{UniInVgprV2S16}, {VgprV2S16}}, !hasSALUFloat)
+  .Uni(V2S16, {{SgprV2S16}, {SgprV2S16}, ScalarizeToS16}, hasSALUFloat)
+  .Div(V2S16, {{VgprV2S16}, {VgprV2S16}})
+  .Any({{UniV2S32}, {{UniInVgprV2S32}, {VgprV2S32}}})
+  .Any({{DivV2S32}, {{VgprV2S32}, {VgprV2S32}}});
+
   addRulesForGOpcs({G_FPTOUI})
   .Any({{UniS32, S32}, {{Sgpr32}, {Sgpr32}}}, hasSALUFloat)
   .Any({{UniS32, S32}, {{UniInVgprS32}, {Vgpr32}}}, !hasSALUFloat);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
new file mode 100644
index 0..093cdf744e3b4
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fabs.ll
@@ -0,0 +1,233 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1100 -o - 
%s | FileCheck -check-prefixes=GCN,GFX11,GFX11-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1100 -o - %s | FileCheck 
-check-prefixes=GCN,GFX11,GFX11-GISEL %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mattr=-real-true16 -mcpu=gfx1200 -o - 
%s | FileCheck -check-prefixes=GCN,GFX12,GFX12-SDAG %s
+; RUN: llc -global-isel -new-reg-bank-select -mtriple=amdgcn-amd-amdpal 
-mattr=-real-true16 -mcpu=gfx1200 -o - %s | FileCheck 
-check-prefixes=GCN,GFX12,GFX12-GISEL %s
+
+define amdgpu_ps void @v_fabs_f16(half %in, ptr addrspace(1) %out) {
+; GCN-LABEL: v_fabs_f16:
+; GCN:   ; %bb.0:
+; GCN-NEXT:v_an

[llvm-branch-commits] [llvm] release/21.x: [CodeGen][ARM64EC] Don't treat guest exit thunks as indirect calls (#165885) (PR #168371)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:

@efriedma-quic What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/168371
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] b835c10 - Revert "DAG: Allow select ptr combine for non-0 address spaces (#167909)"

2025-11-19 Thread via llvm-branch-commits

Author: ronlieb
Date: 2025-11-16T16:47:51-05:00
New Revision: b835c10c902a27d1423d8944534d828afbcb4f6c

URL: 
https://github.com/llvm/llvm-project/commit/b835c10c902a27d1423d8944534d828afbcb4f6c
DIFF: 
https://github.com/llvm/llvm-project/commit/b835c10c902a27d1423d8944534d828afbcb4f6c.diff

LOG: Revert "DAG: Allow select ptr combine for non-0 address spaces (#167909)"

This reverts commit e5f499f48f2d1fddc590982da7232d08a6f8c54c.

Added: 


Modified: 
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
llvm/test/CodeGen/AMDGPU/select-load-to-load-select-ptr-combine.ll
llvm/test/CodeGen/AMDGPU/select-vectors.ll
llvm/test/CodeGen/AMDGPU/select64.ll
llvm/test/CodeGen/NVPTX/bf16-instructions.ll
llvm/test/CodeGen/NVPTX/bf16x2-instructions.ll
llvm/test/CodeGen/NVPTX/bug22246.ll
llvm/test/CodeGen/NVPTX/fast-math.ll
llvm/test/CodeGen/NVPTX/i1-select.ll
llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
llvm/test/CodeGen/NVPTX/lower-byval-args.ll

Removed: 




diff  --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp 
b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 6fbac0f8c8cdf..c9513611e6dcb 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -29033,9 +29033,9 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect, 
SDValue LHS,
 // over-conservative. It would be beneficial to be able to remember
 // both potential memory locations.  Since we are discarding
 // src value info, don't do the transformation if the memory
-// locations are not in the same address space.
-LLD->getPointerInfo().getAddrSpace() !=
-RLD->getPointerInfo().getAddrSpace() ||
+// locations are not in the default address space.
+LLD->getPointerInfo().getAddrSpace() != 0 ||
+RLD->getPointerInfo().getAddrSpace() != 0 ||
 // We can't produce a CMOV of a TargetFrameIndex since we won't
 // generate the address generation required.
 LLD->getBasePtr().getOpcode() == ISD::TargetFrameIndex ||
@@ -29117,9 +29117,6 @@ bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect, 
SDValue LHS,
 // but the new load must be the minimum (most restrictive) alignment of the
 // inputs.
 Align Alignment = std::min(LLD->getAlign(), RLD->getAlign());
-unsigned AddrSpace = LLD->getAddressSpace();
-assert(AddrSpace == RLD->getAddressSpace());
-
 MachineMemOperand::Flags MMOFlags = LLD->getMemOperand()->getFlags();
 if (!RLD->isInvariant())
   MMOFlags &= ~MachineMemOperand::MOInvariant;
@@ -29128,16 +29125,15 @@ bool DAGCombiner::SimplifySelectOps(SDNode 
*TheSelect, SDValue LHS,
 if (LLD->getExtensionType() == ISD::NON_EXTLOAD) {
   // FIXME: Discards pointer and AA info.
   Load = DAG.getLoad(TheSelect->getValueType(0), SDLoc(TheSelect),
- LLD->getChain(), Addr, MachinePointerInfo(AddrSpace),
- Alignment, MMOFlags);
+ LLD->getChain(), Addr, MachinePointerInfo(), 
Alignment,
+ MMOFlags);
 } else {
   // FIXME: Discards pointer and AA info.
   Load = DAG.getExtLoad(
   LLD->getExtensionType() == ISD::EXTLOAD ? RLD->getExtensionType()
   : LLD->getExtensionType(),
   SDLoc(TheSelect), TheSelect->getValueType(0), LLD->getChain(), Addr,
-  MachinePointerInfo(AddrSpace), LLD->getMemoryVT(), Alignment,
-  MMOFlags);
+  MachinePointerInfo(), LLD->getMemoryVT(), Alignment, MMOFlags);
 }
 
 // Users of the select now use the result of the load.

diff  --git a/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll 
b/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
index 5aabad682ad30..d9ad9590d9762 100644
--- a/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
+++ b/llvm/test/CodeGen/AMDGPU/load-select-ptr.ll
@@ -7,31 +7,27 @@
 define amdgpu_kernel void @select_ptr_crash_i64_flat(i32 %tmp, [8 x i32], ptr 
%ptr0, [8 x i32], ptr %ptr1, [8 x i32], ptr addrspace(1) %ptr2) {
 ; GCN-LABEL: select_ptr_crash_i64_flat:
 ; GCN:   ; %bb.0:
+; GCN-NEXT:s_load_dword s6, s[8:9], 0x0
+; GCN-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x28
+; GCN-NEXT:s_load_dwordx2 s[2:3], s[8:9], 0x50
+; GCN-NEXT:s_load_dwordx2 s[4:5], s[8:9], 0x78
 ; GCN-NEXT:s_add_i32 s12, s12, s17
 ; GCN-NEXT:s_lshr_b32 flat_scratch_hi, s12, 8
-; GCN-NEXT:s_load_dword s2, s[8:9], 0x0
-; GCN-NEXT:s_load_dwordx2 s[0:1], s[8:9], 0x78
-; GCN-NEXT:s_add_u32 s4, s8, 40
-; GCN-NEXT:s_addc_u32 s3, s9, 0
-; GCN-NEXT:s_add_u32 s5, s8, 0x50
-; GCN-NEXT:s_addc_u32 s6, s9, 0
 ; GCN-NEXT:s_waitcnt lgkmcnt(0)
-; GCN-NEXT:s_cmp_eq_u32 s2, 0
-; GCN-NEXT:s_cselect_b32 s3, s3, s6
-; GCN-NEXT:s_cselect_b32 s2, s4, s5
-; GCN-N

[llvm-branch-commits] [llvm] [TableGen] Strip directories from filename prefixes. (PR #168352)

2025-11-19 Thread Ivan Kosarev via llvm-branch-commits

https://github.com/kosarev closed 
https://github.com/llvm/llvm-project/pull/168352
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: Matt Arsenault (arsenm)


Changes



---

Patch is 76.41 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/168290.diff


6 Files Affected:

- (modified) llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp (+12-12) 
- (modified) llvm/test/CodeGen/AArch64/sve-extract-scalable-vector.ll (-7) 
- (modified) llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll 
(+133-133) 
- (modified) llvm/test/CodeGen/X86/half.ll (+64-69) 
- (modified) llvm/test/CodeGen/X86/matrix-multiply.ll (+38-36) 
- (modified) llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll 
(+216-218) 


``diff
diff --git a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp 
b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
index ef53ee6df9f06..10d5f7a9b4f65 100644
--- a/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
@@ -5654,7 +5654,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
   // Widen the input and call convert on the widened input vector.
   unsigned NumConcat =
   WidenEC.getKnownMinValue() / InVTEC.getKnownMinValue();
-  SmallVector Ops(NumConcat, DAG.getUNDEF(InVT));
+  SmallVector Ops(NumConcat, DAG.getPOISON(InVT));
   Ops[0] = InOp;
   SDValue InVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InWidenVT, Ops);
   if (N->getNumOperands() == 1)
@@ -5673,7 +5673,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_Convert(SDNode *N) {
 
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
-  SmallVector Ops(WidenEC.getFixedValue(), DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenEC.getFixedValue(), DAG.getPOISON(EltVT));
   // Use the original element count so we don't do more scalar opts than
   // necessary.
   unsigned MinElts = N->getValueType(0).getVectorNumElements();
@@ -5756,7 +5756,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_Convert_StrictFP(SDNode *N) {
   // Otherwise unroll into some nasty scalar code and rebuild the vector.
   EVT EltVT = WidenVT.getVectorElementType();
   std::array EltVTs = {{EltVT, MVT::Other}};
-  SmallVector Ops(WidenNumElts, DAG.getUNDEF(EltVT));
+  SmallVector Ops(WidenNumElts, DAG.getPOISON(EltVT));
   SmallVector OpChains;
   // Use the original element count so we don't do more scalar opts than
   // necessary.
@@ -5819,7 +5819,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTEND_VECTOR_INREG(SDNode *N) {
   }
 
   while (Ops.size() != WidenNumElts)
-Ops.push_back(DAG.getUNDEF(WidenSVT));
+Ops.push_back(DAG.getPOISON(WidenSVT));
 
   return DAG.getBuildVector(WidenVT, DL, Ops);
 }
@@ -6026,7 +6026,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
 // input and then widening it. To avoid this, we widen the input only 
if
 // it results in a legal type.
 if (WidenSize % InSize == 0) {
-  SmallVector Ops(NewNumParts, DAG.getUNDEF(InVT));
+  SmallVector Ops(NewNumParts, DAG.getPOISON(InVT));
   Ops[0] = InOp;
 
   NewVec = DAG.getNode(ISD::CONCAT_VECTORS, dl, NewInVT, Ops);
@@ -6034,7 +6034,7 @@ SDValue DAGTypeLegalizer::WidenVecRes_BITCAST(SDNode *N) {
   SmallVector Ops;
   DAG.ExtractVectorElements(InOp, Ops);
   Ops.append(WidenSize / InScalarSize - Ops.size(),
- DAG.getUNDEF(InVT.getVectorElementType()));
+ DAG.getPOISON(InVT.getVectorElementType()));
 
   NewVec = DAG.getNode(ISD::BUILD_VECTOR, dl, NewInVT, Ops);
 }
@@ -6088,7 +6088,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 if (WidenNumElts % NumInElts == 0) {
   // Add undef vectors to widen to correct length.
   unsigned NumConcat = WidenNumElts / NumInElts;
-  SDValue UndefVal = DAG.getUNDEF(InVT);
+  SDValue UndefVal = DAG.getPOISON(InVT);
   SmallVector Ops(NumConcat);
   for (unsigned i=0; i < NumOperands; ++i)
 Ops[i] = N->getOperand(i);
@@ -6146,7 +6146,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS(SDNode *N) {
 for (unsigned j = 0; j < NumInElts; ++j)
   Ops[Idx++] = DAG.getExtractVectorElt(dl, EltVT, InOp, j);
   }
-  SDValue UndefVal = DAG.getUNDEF(EltVT);
+  SDValue UndefVal = DAG.getPOISON(EltVT);
   for (; Idx < WidenNumElts; ++Idx)
 Ops[Idx] = UndefVal;
   return DAG.getBuildVector(WidenVT, dl, Ops);
@@ -6213,7 +6213,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode *N) {
 Parts.push_back(
 DAG.getExtractSubvector(dl, PartVT, InOp, IdxVal + I * GCD));
   for (; I < WidenNumElts / GCD; ++I)
-Parts.push_back(DAG.getUNDEF(PartVT));
+Parts.push_back(DAG.getPOISON(PartVT));
 
   return DAG.getNode(ISD::CONCAT_VECTORS, dl, WidenVT, Parts);
 }
@@ -6229,7 +6229,7 @@ SDValue 
DAGTypeLegalizer::WidenVecRes_EXTRACT_SUBVECTOR(SDNode 

[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Aiden Grossman (boomanaiden154)


Changes

This test was doing some feature checks within the test itself. This patch
rewrites the feature checks to be done in a fashion more idiomatic to lit,
as the internal shell does not support the features needed for the previous
feature checks.


---
Full diff: https://github.com/llvm/llvm-project/pull/168655.diff


2 Files Affected:

- (modified) 
compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp (+2-13) 
- (modified) compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py (+25) 


``diff
diff --git 
a/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp 
b/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
index 145e162a21c0e..89ee7a178525a 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/dyld_insert_libraries_reexec.cpp
@@ -14,23 +14,12 @@
 // RUN:   %run %t/a.out 2>&1 \
 // RUN:   | FileCheck %s
 
-// RUN: MACOS_MAJOR=$(sw_vers -productVersion | cut -d'.' -f1)
-// RUN: MACOS_MINOR=$(sw_vers -productVersion | cut -d'.' -f2)
-
-// RUN: IS_MACOS_10_11_OR_HIGHER=$([ $MACOS_MAJOR -eq 10 ] && [ $MACOS_MINOR 
-lt 11 ]; echo $?)
-
 // On OS X 10.10 and lower, if the dylib is not DYLD-inserted, ASan will 
re-exec.
-// RUN: if [ $IS_MACOS_10_11_OR_HIGHER == 0 ]; then \
-// RUN:   %env_asan_opts=verbosity=1 %run %t/a.out 2>&1 \
-// RUN:   | FileCheck --check-prefix=CHECK-NOINSERT %s; \
-// RUN:   fi
+// RUN: %if mac-os-10-11-or-higher %{ %env_asan_opts=verbosity=1 %run %t/a.out 
2>&1 | FileCheck --check-prefix=CHECK-NOINSERT %s %}
 
 // On OS X 10.11 and higher, we don't need to DYLD-insert anymore, and the 
interceptors
 // still installed correctly. Let's just check that things work and we don't 
try to re-exec.
-// RUN: if [ $IS_MACOS_10_11_OR_HIGHER == 1 ]; then \
-// RUN:   %env_asan_opts=verbosity=1 %run %t/a.out 2>&1 \
-// RUN:   | FileCheck %s; \
-// RUN:   fi
+// RUN: %if mac-os-10-10-or-lower %{ %env_asan_opts=verbosity=1 %run %t/a.out 
2>&1 | FileCheck %s %}
 
 #include 
 
diff --git a/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py 
b/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
index af82d30cf4de9..b09c1f7cd3daa 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
+++ b/compiler-rt/test/asan/TestCases/Darwin/lit.local.cfg.py
@@ -1,3 +1,6 @@
+import subprocess
+
+
 def getRoot(config):
 if not config.parent:
 return config
@@ -8,3 +11,25 @@ def getRoot(config):
 
 if root.target_os not in ["Darwin"]:
 config.unsupported = True
+
+
+def get_product_version():
+try:
+version_process = subprocess.run(
+["sw_vers", "-productVersion"],
+check=True,
+stdout=subprocess.PIPE,
+stderr=subprocess.PIPE,
+)
+version_string = version_process.stdout.decode("utf-8").split("\n")[0]
+version_split = version_string.split(".")
+return (int(version_split[0]), int(version_split[1]))
+except:
+return (0, 0)
+
+
+macos_version_major, macos_version_minor = get_product_version()
+if macos_version_major > 10 and macos_version_minor > 11:
+config.available_features.add("mac-os-10-11-or-higher")
+else:
+config.available_features.add("mac-os-10-10-or-lower")

``




https://github.com/llvm/llvm-project/pull/168655
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make duplicate_os_log_reports.cpp work with the internal shell (PR #168656)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 created 
https://github.com/llvm/llvm-project/pull/168656

This test used a for loop to implement retries and also did some trickery with 
PIDs.
For this test, just invoke bash for actually running the test given we need the 
PID,
and move the for loop into a separate shell script file that we can then invoke 
from
within the test. Normally it would make sense to rewrite such a script in 
Python, but
given this test does not have portability concerns only running on Darwin, it 
is fine
to use a shell script here given there is no other convenient alternative.



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)

2025-11-19 Thread Aiden Grossman via llvm-branch-commits

https://github.com/boomanaiden154 created 
https://github.com/llvm/llvm-project/pull/168655

This test was doing some feature checks within the test itself. This patch
rewrites the feature checks to be done in a fashion more idiomatic to lit,
as the internal shell does not support the features needed for the previous
feature checks.



___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make duplicate_os_log_reports.cpp work with the internal shell (PR #168656)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Aiden Grossman (boomanaiden154)


Changes

This test used a for loop to implement retries and also did some trickery with 
PIDs.
For this test, just invoke bash for actually running the test given we need the 
PID,
and move the for loop into a separate shell script file that we can then invoke 
from
within the test. Normally it would make sense to rewrite such a script in 
Python, but
given this test does not have portability concerns only running on Darwin, it 
is fine
to use a shell script here given there is no other convenient alternative.


---
Full diff: https://github.com/llvm/llvm-project/pull/168656.diff


2 Files Affected:

- (added) compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh (+6) 
- (modified) 
compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp (+3-7) 


``diff
diff --git a/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh 
b/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh
new file mode 100755
index 0..8939ca7ca1564
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/Darwin/Inputs/check-syslog.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+for I in {1..3}; do \
+  log show --debug --last $((SECONDS + 30))s --predicate "processID == $1" 
--style syslog > $2; \
+  if grep -q "use-after-poison" $2; then break; fi; \
+  sleep 5; \
+done
diff --git 
a/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp 
b/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
index 5a0353bfb1b31..6adca31745bfd 100644
--- a/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
+++ b/compiler-rt/test/asan/TestCases/Darwin/duplicate_os_log_reports.cpp
@@ -1,8 +1,8 @@
 // UNSUPPORTED: ios
 // REQUIRES: darwin_log_cmd
 // RUN: %clangxx_asan -fsanitize-recover=address %s -o %t
-// RUN: { %env_asan_opts=halt_on_error=0,log_to_syslog=1 %run %t > 
%t.process_output.txt 2>&1 & } \
-// RUN: ; export TEST_PID=$! ; wait ${TEST_PID}
+// RUN: bash -c "{ %env_asan_opts=halt_on_error=0,log_to_syslog=1 %run %t > 
%t.process_output.txt 2>&1 & } \
+// RUN: ; export TEST_PID=$! ; wait ${TEST_PID}; echo -n ${TEST_PID} > 
%t.test_pid"
 
 // Check process output.
 // RUN: FileCheck %s --check-prefixes CHECK,CHECK-PROC 
-input-file=%t.process_output.txt
@@ -10,11 +10,7 @@
 // Check syslog output. We filter recent system logs based on PID to avoid
 // getting the logs of previous test runs. Make some reattempts in case there
 // is a delay.
-// RUN: for I in {1..3}; do \
-// RUN:   log show --debug --last $((SECONDS + 30))s --predicate "processID == 
${TEST_PID}" --style syslog > %t.process_syslog_output.txt; \
-// RUN:   if grep -q "use-after-poison" %t.process_syslog_output.txt; then 
break; fi; \
-// RUN:   sleep 5; \
-// RUN: done
+// RUN: %S/Inputs/check-syslog.sh %{readfile:%t.test_pid} 
%t.process_syslog_output.txt
 // RUN: FileCheck %s -input-file=%t.process_syslog_output.txt
 #include 
 #include 

``




https://github.com/llvm/llvm-project/pull/168656
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)

2025-11-19 Thread Andrew Haberlandt via llvm-branch-commits

https://github.com/ndrewh approved this pull request.


https://github.com/llvm/llvm-project/pull/168655
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [ASan] Make dyld_insert_libraries_reexec work with internal shell (PR #168655)

2025-11-19 Thread Andrew Haberlandt via llvm-branch-commits


@@ -8,3 +11,25 @@ def getRoot(config):
 
 if root.target_os not in ["Darwin"]:
 config.unsupported = True
+
+
+def get_product_version():
+try:
+version_process = subprocess.run(
+["sw_vers", "-productVersion"],
+check=True,
+stdout=subprocess.PIPE,
+stderr=subprocess.PIPE,
+)
+version_string = version_process.stdout.decode("utf-8").split("\n")[0]
+version_split = version_string.split(".")
+return (int(version_split[0]), int(version_split[1]))
+except:
+return (0, 0)
+
+
+macos_version_major, macos_version_minor = get_product_version()
+if macos_version_major > 10 and macos_version_minor > 11:
+config.available_features.add("mac-os-10-11-or-higher")

ndrewh wrote:

I think we should only add this feature when `config.apple_platform == "osx"` 
([ref](https://github.com/llvm/llvm-project/blob/afdc5093bb256180b3bec3ff827f21bf23d0f492/compiler-rt/test/lit.common.cfg.py#L411C39-L411C60)).
 This particular test has `// UNSUPPORTED: ios` so it does not break anything 
right now, but it's not ideal to have a feature set based on the host OS if we 
are running on a simulator/device.


https://github.com/llvm/llvm-project/pull/168655
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread Cullen Rhodes via llvm-branch-commits

c-rhodes wrote:

oops, apologies I didn't mean to trigger the bot, please ignore that.

https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread Cullen Rhodes via llvm-branch-commits

c-rhodes wrote:

@clairechingching backports are typically done via the `/cherry-pick ` 
command left as a comment on the original PR, it's documented here: 
https://llvm.org/docs/GitHub.html#backporting-fixes-to-the-release-branches

although I would say it's unlikely this will get backported so late in the 
release cycle given it's a feature. The next release is 21.1.7 on Dec 2nd, at 
this point in the release cycle the criteria is critical bug fixes as 
documented here 
https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules. @tru is the 
release manager for 21.1.7, so ultimately it will be his decision. @tru wdyt?

https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BPF] add allows-misaligned-mem-access target feature (PR #168314)

2025-11-19 Thread via llvm-branch-commits

llvmbot wrote:


>@clairechingching backports are typically done via the `/cherry-pick ` 
>command left as a comment on the original PR, it's documented here: 
>https://llvm.org/docs/GitHub.html#backporting-fixes-to-the-release-branches
>
>although I would say it's unlikely this will get backported so late in the 
>release cycle given it's a feature. The next release is 21.1.7 on Dec 2nd, at 
>this point in the release cycle the criteria is critical bug fixes as 
>documented here 
>https://llvm.org/docs/HowToReleaseLLVM.html#release-patch-rules. @tru is the 
>release manager for 21.1.7, so ultimately it will be his decision. @tru wdyt?

Error: Command failed due to missing milestone.

https://github.com/llvm/llvm-project/pull/168314
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SelectionDAG] Split vector types for atomic load (PR #165818)

2025-11-19 Thread Simon Pilgrim via llvm-branch-commits

https://github.com/RKSimon approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/165818
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] DAG: Use poison for some vector result widening (PR #168290)

2025-11-19 Thread Simon Pilgrim via llvm-branch-commits

https://github.com/RKSimon approved this pull request.

LGTM - cheers

https://github.com/llvm/llvm-project/pull/168290
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Use assertion in VPExpressionRecipe creation (PR #165543)

2025-11-19 Thread Sam Tebbs via llvm-branch-commits

https://github.com/SamTebbs33 closed 
https://github.com/llvm/llvm-project/pull/165543
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LV] Use assertion in VPExpressionRecipe creation (PR #165543)

2025-11-19 Thread Sam Tebbs via llvm-branch-commits

SamTebbs33 wrote:

Not needed as we'll be moving towards creating partial reductions during the 
VPExpressionRecipe creation process.

https://github.com/llvm/llvm-project/pull/165543
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [TableGen][NFCI] Change TableGenMain() to take function_ref. (PR #167888)

2025-11-19 Thread Ivan Kosarev via llvm-branch-commits

https://github.com/kosarev updated 
https://github.com/llvm/llvm-project/pull/167888

>From 12bf9cd3f96ccdcac6ced92d51a06d425375da42 Mon Sep 17 00:00:00 2001
From: Ivan Kosarev 
Date: Thu, 13 Nov 2025 12:10:51 +
Subject: [PATCH] [TableGen][NFCI] Change TableGenMain() to take function_ref.

It was switched from a function pointer to std::function in

TableGen: Make 2nd arg MainFn of TableGenMain(argv0, MainFn) optional.
f675ec6165ab6add5e57cd43a2e9fa1a9bc21d81

but there's no mention of any particular reason for that.
---
 llvm/include/llvm/TableGen/Main.h  | 14 ++
 llvm/lib/TableGen/Main.cpp |  6 ++
 llvm/utils/TableGen/Basic/TableGen.cpp |  2 +-
 3 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/llvm/include/llvm/TableGen/Main.h 
b/llvm/include/llvm/TableGen/Main.h
index bafce3a463acc..daede9f5a46f0 100644
--- a/llvm/include/llvm/TableGen/Main.h
+++ b/llvm/include/llvm/TableGen/Main.h
@@ -14,7 +14,6 @@
 #define LLVM_TABLEGEN_MAIN_H
 
 #include "llvm/Support/CommandLine.h"
-#include 
 #include 
 
 namespace llvm {
@@ -30,18 +29,17 @@ struct TableGenOutputFiles {
 };
 
 /// Returns true on error, false otherwise.
-using TableGenMainFn = bool(raw_ostream &OS, const RecordKeeper &Records);
+using TableGenMainFn =
+function_ref;
 
 /// Perform the action using Records, and store output in OutFiles.
 /// Returns true on error, false otherwise.
-using MultiFileTableGenMainFn = bool(TableGenOutputFiles &OutFiles,
- const RecordKeeper &Records);
+using MultiFileTableGenMainFn = function_ref;
 
-int TableGenMain(const char *argv0,
- std::function MainFn = nullptr);
+int TableGenMain(const char *argv0, TableGenMainFn MainFn = nullptr);
 
-int TableGenMain(const char *argv0,
- std::function MainFn = nullptr);
+int TableGenMain(const char *argv0, MultiFileTableGenMainFn MainFn = nullptr);
 
 /// Controls emitting large character arrays as strings or character arrays.
 /// Typically set to false when building with MSVC.
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index c3869c3fb9a5a..499723ab2acdc 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -127,8 +127,7 @@ static int WriteOutput(const TGParser &Parser, const char 
*argv0,
   return 0;
 }
 
-int llvm::TableGenMain(const char *argv0,
-   std::function MainFn) {
+int llvm::TableGenMain(const char *argv0, MultiFileTableGenMainFn MainFn) {
   RecordKeeper Records;
   TGTimer &Timer = Records.getTimer();
 
@@ -210,8 +209,7 @@ int llvm::TableGenMain(const char *argv0,
   return 0;
 }
 
-int llvm::TableGenMain(const char *argv0,
-   std::function MainFn) {
+int llvm::TableGenMain(const char *argv0, TableGenMainFn MainFn) {
   return TableGenMain(argv0, [&MainFn](TableGenOutputFiles &OutFiles,
const RecordKeeper &Records) {
 std::string S;
diff --git a/llvm/utils/TableGen/Basic/TableGen.cpp 
b/llvm/utils/TableGen/Basic/TableGen.cpp
index b79ae93dab4f7..a655cbbc16096 100644
--- a/llvm/utils/TableGen/Basic/TableGen.cpp
+++ b/llvm/utils/TableGen/Basic/TableGen.cpp
@@ -73,7 +73,7 @@ int tblgen_main(int argc, char **argv) {
   InitLLVM X(argc, argv);
   cl::ParseCommandLineOptions(argc, argv);
 
-  std::function MainFn = nullptr;
+  MultiFileTableGenMainFn MainFn = nullptr;
   return TableGenMain(argv[0], MainFn);
 }
 

___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LTT] Mark as unkown weak function tests. (PR #167399)

2025-11-19 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin converted_to_draft 
https://github.com/llvm/llvm-project/pull/167399
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] RuntimeLibcalls: Add memset_pattern* calls to darwin systems (PR #167083)

2025-11-19 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/167083
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [BOLT][NFC] Rename Pointer Auth DWARF rewriter passes (PR #164622)

2025-11-19 Thread Gergely Bálint via llvm-branch-commits

bgergely0 wrote:

One more thing I'd like to sneak in here: adding the --print- flags 
for these passes. We discussed this before, but I didn't add them to the 
original patch (#120064). 

https://github.com/llvm/llvm-project/pull/164622
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [GOFF] Write out relocations in the GOFF writer (PR #167054)

2025-11-19 Thread Kai Nacke via llvm-branch-commits


@@ -545,8 +743,68 @@ GOFFObjectWriter::GOFFObjectWriter(
 
 GOFFObjectWriter::~GOFFObjectWriter() = default;
 
+void GOFFObjectWriter::recordRelocation(const MCFragment &F,
+const MCFixup &Fixup, MCValue Target,
+uint64_t &FixedValue) {
+  const MCFixupKindInfo &FKI =
+  Asm->getBackend().getFixupKindInfo(Fixup.getKind());
+  const uint32_t Length = FKI.TargetSize / 8;
+  assert(FKI.TargetSize % 8 == 0 && "Target Size not multiple of 8");
+  const uint64_t FixupOffset = Asm->getFragmentOffset(F) + Fixup.getOffset();
+  bool IsPCRel = Fixup.isPCRel();
+
+  unsigned RelocType = TargetObjectWriter->getRelocType(Target, Fixup, 
IsPCRel);
+
+  const MCSectionGOFF *PSection = static_cast(F.getParent());
+  const auto &A = *static_cast(Target.getAddSym());
+  const MCSymbolGOFF *B = static_cast(Target.getSubSym());
+  if (RelocType == MCGOFFObjectTargetWriter::Reloc_Type_RelImm) {
+if (A.isUndefined()) {
+  Asm->reportError(
+  Fixup.getLoc(),
+  Twine("symbol ")
+  .concat(A.getName())
+  .concat(" must be defined for a relative immediate relocation"));
+  return;
+}
+if (&A.getSection() != PSection) {
+  Asm->reportError(Fixup.getLoc(),
+   Twine("relative immediate relocation section mismatch: 
")
+   .concat(A.getSection().getName())
+   .concat(" of symbol ")
+   .concat(A.getName())
+   .concat(" <-> ")
+   .concat(PSection->getName()));
+  return;
+}
+if (B) {
+  Asm->reportError(
+  Fixup.getLoc(),
+  Twine("subtractive symbol ")
+  .concat(B->getName())
+  .concat(" not supported for a relative immediate relocation"));
+  return;
+}
+FixedValue = Asm->getSymbolOffset(A) - FixupOffset + Target.getConstant();
+return;
+  }
+  FixedValue = Target.getConstant();
+
+  // The symbol only has a section-relative offset if it is a temporary symbol.
+  FixedValue += A.isTemporary() ? Asm->getSymbolOffset(A) : 0;
+  A.setUsedInReloc();
+  if (B) {
+FixedValue -= B->isTemporary() ? Asm->getSymbolOffset(*B) : 0;
+B->setUsedInReloc();
+  }
+
+  // Save relocation data for later writing.
+  SavedRelocs.emplace_back(PSection, &A, B, RelocType, FixupOffset, Length,
+   FixedValue);
+}
+
 uint64_t GOFFObjectWriter::writeObject() {
-  uint64_t Size = GOFFWriter(OS, *Asm).writeObject();
+  uint64_t Size = GOFFWriter(OS, *Asm, SavedRelocs).writeObject();

redstar wrote:

That is an interesting style question. The ELF solution also requires to have 
all fields public, because it is difficult to make a class in an anonymous 
namespace a `friend` of a class in a header file. I prefer passing the required 
fields instead of making the fields public.
Looking forward, I do not see that I need to pass more fields from the 
`GOFFObjectWriter` into the `GOFFWriter`, so the current solution looks 
reasonable to me.

https://github.com/llvm/llvm-project/pull/167054
___
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


  1   2   >