[llvm-branch-commits] [llvm] [CodeGen][NPM] Port StackFrameLayoutAnalysisPass to NPM (PR #130070)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/130070

>From 94618bb78cba09176b39b9d603c0fb9cdb066676 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Thu, 6 Mar 2025 10:45:25 +
Subject: [PATCH] [CodeGen][NPM] Port StackFrameLayoutAnalysisPass to NPM

---
 .../CodeGen/StackFrameLayoutAnalysisPass.h| 26 
 llvm/include/llvm/InitializePasses.h  |  2 +-
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  3 +
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 .../CodeGen/StackFrameLayoutAnalysisPass.cpp  | 61 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 7 files changed, 74 insertions(+), 23 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h

diff --git a/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h 
b/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h
new file mode 100644
index 0..5283cda30da12
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h
@@ -0,0 +1,26 @@
+//===- llvm/CodeGen/StackFrameLayoutAnalysisPass.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
+#define LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class StackFrameLayoutAnalysisPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+  static bool isRequired() { return true; }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index c7bc4320cf8f0..9068aee8f8193 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -290,7 +290,7 @@ void initializeSlotIndexesWrapperPassPass(PassRegistry &);
 void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &);
 void initializeSpillPlacementWrapperLegacyPass(PassRegistry &);
 void initializeStackColoringLegacyPass(PassRegistry &);
-void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &);
+void initializeStackFrameLayoutAnalysisLegacyPass(PassRegistry &);
 void initializeStaticDataSplitterPass(PassRegistry &);
 void initializeStackMapLivenessPass(PassRegistry &);
 void initializeStackProtectorPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 74cdc7d66810b..8cba36b36fbb2 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -80,6 +80,7 @@
 #include "llvm/CodeGen/ShadowStackGCLowering.h"
 #include "llvm/CodeGen/SjLjEHPrepare.h"
 #include "llvm/CodeGen/StackColoring.h"
+#include "llvm/CodeGen/StackFrameLayoutAnalysisPass.h"
 #include "llvm/CodeGen/StackProtector.h"
 #include "llvm/CodeGen/StackSlotColoring.h"
 #include "llvm/CodeGen/TailDuplication.h"
@@ -1015,6 +1016,8 @@ Error CodeGenPassBuilder::addMachinePasses(
   addPass(MachineOutlinerPass(RunOnAllFunctions));
   }
 
+  addPass(StackFrameLayoutAnalysisPass());
+
   // Add passes that directly emit MI after all other MI passes.
   derived().addPreEmitPass2(addPass);
 
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 8fa21751392f3..01dd423de6955 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -187,6 +187,7 @@ MACHINE_FUNCTION_PASS("remove-redundant-debug-values", 
RemoveRedundantDebugValue
 MACHINE_FUNCTION_PASS("require-all-machine-function-properties",
   RequireAllMachineFunctionPropertiesPass())
 MACHINE_FUNCTION_PASS("stack-coloring", StackColoringPass())
+MACHINE_FUNCTION_PASS("stack-frame-layout", StackFrameLayoutAnalysisPass())
 MACHINE_FUNCTION_PASS("stack-slot-coloring", StackSlotColoringPass())
 MACHINE_FUNCTION_PASS("tailduplication", TailDuplicatePass())
 MACHINE_FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
@@ -295,7 +296,6 @@ DUMMY_MACHINE_FUNCTION_PASS("regallocscoringpass", 
RegAllocScoringPass)
 DUMMY_MACHINE_FUNCTION_PASS("regbankselect", RegBankSelectPass)
 DUMMY_MACHINE_FUNCTION_PASS("reset-machine-function", ResetMachineFunctionPass)
 DUMMY_MACHINE_FUNCTION_PASS("shrink-wrap", ShrinkWrapPass)
-DUMMY_MACHINE_FUNCTION_PASS("stack-frame-layout", StackFrameLayoutAnalysisPass)
 DUMMY_MACHINE_FUNCTION_PASS("stackmap-liveness", StackMapLivenessPass)
 DUMMY_MACHINE_FUNCTION_PASS("unpack-mi-bundles", UnpackMachine

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SILateBranchLowering to NPM (PR #130063)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/130063?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#130070** https://app.graphite.dev/github/pr/llvm/llvm-project/130070?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130069** https://app.graphite.dev/github/pr/llvm/llvm-project/130069?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130068** https://app.graphite.dev/github/pr/llvm/llvm-project/130068?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130067** https://app.graphite.dev/github/pr/llvm/llvm-project/130067?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130066** https://app.graphite.dev/github/pr/llvm/llvm-project/130066?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130065** https://app.graphite.dev/github/pr/llvm/llvm-project/130065?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130064** https://app.graphite.dev/github/pr/llvm/llvm-project/130064?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130063** https://app.graphite.dev/github/pr/llvm/llvm-project/130063?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/130063?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#130062** https://app.graphite.dev/github/pr/llvm/llvm-project/130062?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130061** https://app.graphite.dev/github/pr/llvm/llvm-project/130061?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130060** https://app.graphite.dev/github/pr/llvm/llvm-project/130060?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130059** https://app.graphite.dev/github/pr/llvm/llvm-project/130059?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129866** https://app.graphite.dev/github/pr/llvm/llvm-project/129866?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129865** https://app.graphite.dev/github/pr/llvm/llvm-project/129865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129857** https://app.graphite.dev/github/pr/llvm/llvm-project/129857?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129853** https://app.graphite.dev/github/pr/llvm/llvm-project/129853?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129828** https://app.graphite.dev/github/pr/llvm/llvm-project/129828?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/130063
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][driver] Use rva22u64_v as the default march for Fuchsia targets (PR #131183)

2025-03-15 Thread via llvm-branch-commits

https://github.com/hiraditya approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/131183
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port VirtRegRewriter to NPM (PR #130564)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan edited 
https://github.com/llvm/llvm-project/pull/130564
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port AMDGPUSetWavePriority to NPM (PR #130064)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/130064
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking (PR #130617)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-nvptx

Author: Fabian Ritter (ritter-x2a)


Changes

If we know that the initial GEP was inbounds, and we change it to a
sequence of GEPs from the same base pointer where every offset is
non-negative, then the new GEPs are inbounds.

For SWDEV-516125.

---
Full diff: https://github.com/llvm/llvm-project/pull/130617.diff


4 Files Affected:

- (modified) llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp (+13-5) 
- (modified) 
llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll 
(+23) 
- (modified) 
llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll 
(+8-8) 
- (modified) llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll 
(+4-4) 


``diff
diff --git a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp 
b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
index 138a71ce79cef..070afdf0752f4 100644
--- a/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
+++ b/llvm/lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
@@ -1052,6 +1052,8 @@ bool 
SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
 }
   }
 
+  bool MayRecoverInbounds = AccumulativeByteOffset >= 0 && GEP->isInBounds();
+
   // Remove the constant offset in each sequential index. The resultant GEP
   // computes the variadic base.
   // Notice that we don't remove struct field indices here. If LowerGEP is
@@ -1079,6 +1081,8 @@ bool 
SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
 // and the old index if they are not used.
 RecursivelyDeleteTriviallyDeadInstructions(UserChainTail);
 RecursivelyDeleteTriviallyDeadInstructions(OldIdx);
+MayRecoverInbounds =
+MayRecoverInbounds && computeKnownBits(NewIdx, 
*DL).isNonNegative();
   }
 }
   }
@@ -1100,11 +1104,15 @@ bool 
SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
   // address with silently-wrapping two's complement arithmetic".
   // Therefore, the final code will be a semantically equivalent.
   //
-  // TODO(jingyue): do some range analysis to keep as many inbounds as
-  // possible. GEPs with inbounds are more friendly to alias analysis.
-  // TODO(gep_nowrap): Preserve nuw at least.
-  auto NewGEPFlags = GEPNoWrapFlags::none();
-  GEP->setNoWrapFlags(GEPNoWrapFlags::none());
+  // If the initial GEP was inbounds and all variable indices and the
+  // accumulated offsets are non-negative, they can be added in any order and
+  // the intermediate results are in bounds. So, we can preserve the inbounds
+  // flag for both GEPs. GEPs with inbounds are more friendly to alias 
analysis.
+  //
+  // TODO(gep_nowrap): Preserve nuw?
+  auto NewGEPFlags =
+  MayRecoverInbounds ? GEPNoWrapFlags::inBounds() : GEPNoWrapFlags::none();
+  GEP->setNoWrapFlags(NewGEPFlags);
 
   // Lowers a GEP to either GEPs with a single index or arithmetic operations.
   if (LowerGEP) {
diff --git 
a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll 
b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
index 877de38776839..91b5bc874c154 100644
--- 
a/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
+++ 
b/llvm/test/Transforms/SeparateConstOffsetFromGEP/AMDGPU/preserve-inbounds.ll
@@ -24,3 +24,26 @@ entry:
   store float %3, ptr %arrayidx.dst, align 4
   ret void
 }
+
+; All offsets must be positive, so inbounds can be preserved.
+define void @must_be_inbounds(ptr %dst, ptr %src, i32 %i) {
+; CHECK-LABEL: @must_be_inbounds(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:[[I_PROM:%.*]] = zext i32 [[I:%.*]] to i64
+; CHECK-NEXT:[[TMP0:%.*]] = getelementptr inbounds float, ptr [[SRC:%.*]], 
i64 [[I_PROM]]
+; CHECK-NEXT:[[ARRAYIDX_SRC2:%.*]] = getelementptr inbounds i8, ptr 
[[TMP0]], i64 4
+; CHECK-NEXT:[[TMP1:%.*]] = load float, ptr [[ARRAYIDX_SRC2]], align 4
+; CHECK-NEXT:[[TMP2:%.*]] = getelementptr inbounds float, ptr [[DST:%.*]], 
i64 [[I_PROM]]
+; CHECK-NEXT:[[ARRAYIDX_DST4:%.*]] = getelementptr inbounds i8, ptr 
[[TMP2]], i64 4
+; CHECK-NEXT:store float [[TMP1]], ptr [[ARRAYIDX_DST4]], align 4
+; CHECK-NEXT:ret void
+;
+entry:
+  %i.prom = zext i32 %i to i64
+  %idx = add nsw i64 %i.prom, 1
+  %arrayidx.src = getelementptr inbounds float, ptr %src, i64 %idx
+  %3 = load float, ptr %arrayidx.src, align 4
+  %arrayidx.dst = getelementptr inbounds float, ptr %dst, i64 %idx
+  store float %3, ptr %arrayidx.dst, align 4
+  ret void
+}
diff --git 
a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll 
b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
index 9a73feb2c4b5c..4474585bf9b06 100644
--- a/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
+++ b/llvm/test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll
@@ -157,19 +157,19 @@ define void @sum_of_array3(i32 %x, i32 %y, ptr nocapture 
%ou

[llvm-branch-commits] [libcxx] [libc++][format] Implements P3107R5 in . (PR #130500)

2025-03-15 Thread Mark de Wever via llvm-branch-commits

https://github.com/mordante updated 
https://github.com/llvm/llvm-project/pull/130500

>From f3b052aa1bbc633655108e6e3a432c820169d96f Mon Sep 17 00:00:00 2001
From: Mark de Wever 
Date: Sat, 30 Mar 2024 17:35:56 +0100
Subject: [PATCH] [libc++][format] Implements P3107R5 in .

The followup paper P3235R3 which is voted in as a DR changes the names
foo_locking to foo_buffered. These changes have been applied in this
patch.

Before
---
Benchmark Time CPU   Iterations
---
printf 71.3 ns 71.3 ns  9525175
print_string226 ns  226 ns  3105850
print_stack 232 ns  232 ns  3026498
print_direct530 ns  530 ns  1318447

After
---
Benchmark Time CPU   Iterations
---
printf 70.6 ns 70.6 ns  9789585
print_string222 ns  222 ns  3147678
print_stack 227 ns  227 ns  3084767
print_direct474 ns  474 ns  1472786

Note: The performance of libc++'s std::print is still extemely slow
compared to printf. Based on P3107R5 std::print should outperform
printf. The main culprit is the call to isatty, which is resolved
after implementing
LWG4044  Confusing requirements for std::print on POSIX platforms

Implements
- P3107R5 - Permit an efficient implementation of ``std::print``

Implements parts of
- P3235R3 std::print more types faster with less memory

Fixes: #105435
---
 libcxx/docs/ReleaseNotes/21.rst   |   1 +
 libcxx/include/__format/buffer.h  |   3 +
 libcxx/include/print  | 249 +-
 libcxx/modules/std/print.inc  |   1 +
 .../test/libcxx/transitive_includes/cxx03.csv |   5 +
 .../test/libcxx/transitive_includes/cxx11.csv |   5 +
 .../test/libcxx/transitive_includes/cxx14.csv |   5 +
 .../test/libcxx/transitive_includes/cxx17.csv |   5 +
 .../test/libcxx/transitive_includes/cxx23.csv |   5 +-
 .../test/libcxx/transitive_includes/cxx26.csv |   4 +
 10 files changed, 270 insertions(+), 13 deletions(-)

diff --git a/libcxx/docs/ReleaseNotes/21.rst b/libcxx/docs/ReleaseNotes/21.rst
index e7cfa625a132c..a1f30b26c5a1d 100644
--- a/libcxx/docs/ReleaseNotes/21.rst
+++ b/libcxx/docs/ReleaseNotes/21.rst
@@ -40,6 +40,7 @@ Implemented Papers
 
 - N4258: Cleaning-up noexcept in the Library (`Github 
`__)
 - P1361R2: Integration of chrono with text formatting (`Github 
`__)
+- P3107R5 - Permit an efficient implementation of ``std::print`` (`Github 
`__)
 
 Improvements and New Features
 -
diff --git a/libcxx/include/__format/buffer.h b/libcxx/include/__format/buffer.h
index c88b7f3222010..d6e4ddc840e2d 100644
--- a/libcxx/include/__format/buffer.h
+++ b/libcxx/include/__format/buffer.h
@@ -12,6 +12,7 @@
 
 #include <__algorithm/copy_n.h>
 #include <__algorithm/fill_n.h>
+#include <__algorithm/for_each.h>
 #include <__algorithm/max.h>
 #include <__algorithm/min.h>
 #include <__algorithm/ranges_copy.h>
@@ -34,11 +35,13 @@
 #include <__memory/construct_at.h>
 #include <__memory/destroy.h>
 #include <__memory/uninitialized_algorithms.h>
+#include <__system_error/system_error.h>
 #include <__type_traits/add_pointer.h>
 #include <__type_traits/conditional.h>
 #include <__utility/exception_guard.h>
 #include <__utility/move.h>
 #include 
+#include  // Uses the POSIX/Windows unlocked stream I/O
 #include 
 
 #if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
diff --git a/libcxx/include/print b/libcxx/include/print
index 1794d6014efcd..5489b993b03a3 100644
--- a/libcxx/include/print
+++ b/libcxx/include/print
@@ -27,9 +27,11 @@ namespace std {
 
   void vprint_unicode(string_view fmt, format_args args);
   void vprint_unicode(FILE* stream, string_view fmt, format_args args);
+  void vprint_unicode_buffered(FILE* stream, string_view fmt, format_args 
args);
 
   void vprint_nonunicode(string_view fmt, format_args args);
   void vprint_nonunicode(FILE* stream, string_view fmt, format_args args);
+  void vprint_nonunicode_buffered(FILE* stream, string_view fmt, format_args 
args);
 }
 */
 
@@ -213,6 +215,107 @@ _LIBCPP_HIDE_FROM_ABI inline bool 
__is_terminal([[maybe_unused]] FILE* __stream)
 #endif
 }
 
+_LIBCPP_HIDE_FROM_ABI inline void __flockfile(FILE* __stream) {
+#if defined(_LIBCPP_WIN32API)
+  ::_lock_file(__stream);
+#elif __has_include()
+  ::flockfile(__stream);
+#else
+#  error "Provide a way to do unlocked stream I/O operations"
+#endif
+}
+_LIBCPP_HIDE_FROM_ABI inline void __funlockfile(FILE* __stream) {
+#if defined(_LIBCPP_WIN32API)
+  ::_unloc

[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117037

>From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 12 Mar 2025 23:30:01 +
Subject: [PATCH] Fix EOF newlines.

Created using spr 1.3.6-beta.1
---
 clang/test/Driver/call-graph-section.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Driver/call-graph-section.c 
b/clang/test/Driver/call-graph-section.c
index 108446729d857..5832aa6754137 100644
--- a/clang/test/Driver/call-graph-section.c
+++ b/clang/test/Driver/call-graph-section.c
@@ -2,4 +2,4 @@
 // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | 
FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s
 
 // CALL-GRAPH-SECTION: "-fcall-graph-section"
-// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"
\ No newline at end of file
+// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] release/20.x: [Hexagon] Set the default compilation target to V68 (#125239) (PR #128597)

2025-03-15 Thread Brian Cain via llvm-branch-commits

androm3da wrote:

@quic-akaryaki can you fix #127558 on `main` and then cherry-pick it to 20.x?  
Because we are going to hold this pull req from 20.x until that one is ready.  
Or at least IMO it doesn't make sense to change the default compiler target 
without also changing the default for the assembler.

IIRC 20.1.1 is scheduled for ~2 weeks from 20.1.0 which should be ~18 March.

https://github.com/llvm/llvm-project/pull/128597
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port StackFrameLayoutAnalysisPass to NPM (PR #130070)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/130070

>From 92a1774222ba4f86a8781c4f62253865cc4ed74f Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Thu, 6 Mar 2025 10:45:25 +
Subject: [PATCH] [CodeGen][NPM] Port StackFrameLayoutAnalysisPass to NPM

---
 .../CodeGen/StackFrameLayoutAnalysisPass.h| 26 
 llvm/include/llvm/InitializePasses.h  |  2 +-
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  3 +
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 .../CodeGen/StackFrameLayoutAnalysisPass.cpp  | 61 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 7 files changed, 74 insertions(+), 23 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h

diff --git a/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h 
b/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h
new file mode 100644
index 0..5283cda30da12
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/StackFrameLayoutAnalysisPass.h
@@ -0,0 +1,26 @@
+//===- llvm/CodeGen/StackFrameLayoutAnalysisPass.h --*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
+#define LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class StackFrameLayoutAnalysisPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+  static bool isRequired() { return true; }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_STACKFRAMELAYOUTANALYSISPASS_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index c7bc4320cf8f0..9068aee8f8193 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -290,7 +290,7 @@ void initializeSlotIndexesWrapperPassPass(PassRegistry &);
 void initializeSpeculativeExecutionLegacyPassPass(PassRegistry &);
 void initializeSpillPlacementWrapperLegacyPass(PassRegistry &);
 void initializeStackColoringLegacyPass(PassRegistry &);
-void initializeStackFrameLayoutAnalysisPassPass(PassRegistry &);
+void initializeStackFrameLayoutAnalysisLegacyPass(PassRegistry &);
 void initializeStaticDataSplitterPass(PassRegistry &);
 void initializeStackMapLivenessPass(PassRegistry &);
 void initializeStackProtectorPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 74cdc7d66810b..8cba36b36fbb2 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -80,6 +80,7 @@
 #include "llvm/CodeGen/ShadowStackGCLowering.h"
 #include "llvm/CodeGen/SjLjEHPrepare.h"
 #include "llvm/CodeGen/StackColoring.h"
+#include "llvm/CodeGen/StackFrameLayoutAnalysisPass.h"
 #include "llvm/CodeGen/StackProtector.h"
 #include "llvm/CodeGen/StackSlotColoring.h"
 #include "llvm/CodeGen/TailDuplication.h"
@@ -1015,6 +1016,8 @@ Error CodeGenPassBuilder::addMachinePasses(
   addPass(MachineOutlinerPass(RunOnAllFunctions));
   }
 
+  addPass(StackFrameLayoutAnalysisPass());
+
   // Add passes that directly emit MI after all other MI passes.
   derived().addPreEmitPass2(addPass);
 
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 8fa21751392f3..01dd423de6955 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -187,6 +187,7 @@ MACHINE_FUNCTION_PASS("remove-redundant-debug-values", 
RemoveRedundantDebugValue
 MACHINE_FUNCTION_PASS("require-all-machine-function-properties",
   RequireAllMachineFunctionPropertiesPass())
 MACHINE_FUNCTION_PASS("stack-coloring", StackColoringPass())
+MACHINE_FUNCTION_PASS("stack-frame-layout", StackFrameLayoutAnalysisPass())
 MACHINE_FUNCTION_PASS("stack-slot-coloring", StackSlotColoringPass())
 MACHINE_FUNCTION_PASS("tailduplication", TailDuplicatePass())
 MACHINE_FUNCTION_PASS("trigger-verifier-error", TriggerVerifierErrorPass())
@@ -295,7 +296,6 @@ DUMMY_MACHINE_FUNCTION_PASS("regallocscoringpass", 
RegAllocScoringPass)
 DUMMY_MACHINE_FUNCTION_PASS("regbankselect", RegBankSelectPass)
 DUMMY_MACHINE_FUNCTION_PASS("reset-machine-function", ResetMachineFunctionPass)
 DUMMY_MACHINE_FUNCTION_PASS("shrink-wrap", ShrinkWrapPass)
-DUMMY_MACHINE_FUNCTION_PASS("stack-frame-layout", StackFrameLayoutAnalysisPass)
 DUMMY_MACHINE_FUNCTION_PASS("stackmap-liveness", StackMapLivenessPass)
 DUMMY_MACHINE_FUNCTION_PASS("unpack-mi-bundles", UnpackMachine

[llvm-branch-commits] [llvm] release/20.x: [llvm-objcopy] Apply encryptable offset to first segment, not section (#130517) (PR #131398)

2025-03-15 Thread via llvm-branch-commits

github-actions[bot] wrote:

⚠️ We detected that you are using a GitHub private e-mail address to contribute 
to the repo. Please turn off [Keep my email addresses 
private](https://github.com/settings/emails) setting in your account. See 
[LLVM 
Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it)
 for more information.

https://github.com/llvm/llvm-project/pull/131398
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NPM] Cleanup AMDGPUPassRegistry.def (PR #130071)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes

Finishing up AMDGPU specific passes. Only ones remaining are assembly printer, 
virt reg rewriter and PEI.

---
Full diff: https://github.com/llvm/llvm-project/pull/130071.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def (+1-7) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUPreloadKernArgProlog.cpp (+1-1) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index f14499d0d3146..ad2f3fc29077c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,6 +102,7 @@ MACHINE_FUNCTION_PASS("amdgpu-pre-ra-long-branch-reg", 
GCNPreRALongBranchRegPass
 MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUsesPass())
 MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", AMDGPUSetWavePriorityPass())
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
+MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
@@ -131,13 +132,6 @@ MACHINE_FUNCTION_PASS("si-wqm", SIWholeQuadModePass())
 #undef MACHINE_FUNCTION_PASS
 
 #define DUMMY_MACHINE_FUNCTION_PASS(NAME, CREATE_PASS)
-DUMMY_MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
-DUMMY_MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUsesPass())
-
-// TODO: Move amdgpu-preload-kern-arg-prolog to MACHINE_FUNCTION_PASS since it
-// already exists.
-DUMMY_MACHINE_FUNCTION_PASS("amdgpu-preload-kern-arg-prolog", 
AMDGPUPreloadKernArgPrologPass())
-
 // Global ISel passes
 DUMMY_MACHINE_FUNCTION_PASS("amdgpu-prelegalizer-combiner", 
AMDGPUPreLegalizerCombinerPass())
 DUMMY_MACHINE_FUNCTION_PASS("amdgpu-postlegalizer-combiner", 
AMDGPUPostLegalizerCombinerPass())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernArgProlog.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernArgProlog.cpp
index b3a2139dfd24e..40094518dce0a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernArgProlog.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPreloadKernArgProlog.cpp
@@ -207,5 +207,5 @@ AMDGPUPreloadKernArgPrologPass::run(MachineFunction &MF,
   if (!AMDGPUPreloadKernArgProlog(MF).run())
 return PreservedAnalyses::all();
 
-  return PreservedAnalyses::none();
+  return getMachineFunctionPassPreservedAnalyses();
 }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index f380ddd03957f..a71766f2fd012 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -24,6 +24,7 @@
 #include "AMDGPUMacroFusion.h"
 #include "AMDGPUOpenCLEnqueuedBlockLowering.h"
 #include "AMDGPUPerfHintAnalysis.h"
+#include "AMDGPUPreloadKernArgProlog.h"
 #include "AMDGPURemoveIncompatibleFunctions.h"
 #include "AMDGPUSplitModule.h"
 #include "AMDGPUTargetObjectFile.h"

``




https://github.com/llvm/llvm-project/pull/130071
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [llvm] [ctxprof] Make ContextRoot an implementation detail (PR #131416)

2025-03-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/131416

>From e6d651d645a5510011f9f90e28e812e5bb46f64f Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 13 Mar 2025 20:46:45 -0700
Subject: [PATCH] [ctxprof] Make ContextRoot an implementation detail

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 25 --
 .../lib/ctx_profile/CtxInstrProfiling.h   | 26 +-
 .../tests/CtxInstrProfilingTest.cpp   | 30 ++--
 .../Instrumentation/PGOCtxProfLowering.cpp| 49 +++
 .../PGOProfile/ctx-instrumentation.ll | 24 -
 5 files changed, 82 insertions(+), 72 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 1c2cad1ca506e..6ef7076d93e31 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -336,10 +336,28 @@ void setupContext(ContextRoot *Root, GUID Guid, uint32_t 
NumCounters,
   AllContextRoots.PushBack(Root);
 }
 
+ContextRoot *FunctionData::getOrAllocateContextRoot() {
+  auto *Root = CtxRoot;
+  if (!Root) {
+__sanitizer::GenericScopedLock<__sanitizer::StaticSpinMutex> L(&Mutex);
+Root = CtxRoot;
+if (!Root) {
+  Root =
+  new (__sanitizer::InternalAlloc(sizeof(ContextRoot))) ContextRoot();
+  CtxRoot = Root;
+}
+  }
+  assert(Root);
+  return Root;
+}
+
 ContextNode *__llvm_ctx_profile_start_context(
-ContextRoot *Root, GUID Guid, uint32_t Counters,
+FunctionData *FData, GUID Guid, uint32_t Counters,
 uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = true;
+
+  auto *Root = FData->getOrAllocateContextRoot();
+
   __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
 __sanitizer::memory_order_relaxed);
 
@@ -356,12 +374,13 @@ ContextNode *__llvm_ctx_profile_start_context(
   return TheScratchContext;
 }
 
-void __llvm_ctx_profile_release_context(ContextRoot *Root)
+void __llvm_ctx_profile_release_context(FunctionData *FData)
 SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = false;
   if (__llvm_ctx_profile_current_context_root) {
 __llvm_ctx_profile_current_context_root = nullptr;
-Root->Taken.Unlock();
+assert(FData->CtxRoot);
+FData->CtxRoot->Taken.Unlock();
   }
 }
 
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
index 72cc60bf523e1..6bb954da950c4 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
@@ -84,7 +84,6 @@ struct ContextRoot {
   // Count the number of entries - regardless if we could take the `Taken` 
mutex
   ::__sanitizer::atomic_uint64_t TotalEntries = {};
 
-  // This is init-ed by the static zero initializer in LLVM.
   // Taken is used to ensure only one thread traverses the contextual graph -
   // either to read it or to write it. On server side, the same entrypoint will
   // be entered by numerous threads, but over time, the profile aggregated by
@@ -109,12 +108,7 @@ struct ContextRoot {
   // or with more concurrent collections (==more memory) and less collection
   // time. Note that concurrent collection does happen for different
   // entrypoints, regardless.
-  ::__sanitizer::StaticSpinMutex Taken;
-
-  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
-  // instrumentation lowering side because it is responsible for allocating and
-  // zero-initializing ContextRoots.
-  static_assert(sizeof(Taken) == 1);
+  ::__sanitizer::SpinMutex Taken;
 };
 
 // This is allocated and zero-initialized by the compiler, the in-place
@@ -139,8 +133,16 @@ struct FunctionData {
   FunctionData() { Mutex.Init(); }
 
   FunctionData *Next = nullptr;
+  ContextRoot *volatile CtxRoot = nullptr;
   ContextNode *volatile FlatCtx = nullptr;
+
+  ContextRoot *getOrAllocateContextRoot();
+
   ::__sanitizer::StaticSpinMutex Mutex;
+  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
+  // instrumentation lowering side because it is responsible for allocating and
+  // zero-initializing ContextRoots.
+  static_assert(sizeof(Mutex) == 1);
 };
 
 /// This API is exposed for testing. See the APIs below about the contract with
@@ -172,17 +174,17 @@ extern __thread __ctx_profile::ContextRoot
 
 /// called by LLVM in the entry BB of a "entry point" function. The returned
 /// pointer may be "tainted" - its LSB set to 1 - to indicate it's scratch.
-ContextNode *__llvm_ctx_profile_start_context(__ctx_profile::ContextRoot *Root,
-  GUID Guid, uint32_t Counters,
-  uint32_t Callsites);
+ContextNode *
+__llvm_ctx_profile_start_context(__ctx_profile::FunctionData *FData, GUID Guid,
+ uint32_t Counters, uint32_t Callsites);
 
 /// paired with __

[llvm-branch-commits] [llvm] [NFC][Coro] Use CloneFunctionInto for coroutine cloning instead of CloneFunction (PR #129149)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129149

>From aa95bd86b5e80797b53b3915059d06b66cebcf85 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 12:42:14 -0800
Subject: [PATCH] [NFC][Coro] Use CloneFunctionInto for coroutine cloning
 instead of CloneFunction

Summary:
CloneFunctionInto now is fast on its own and we don't need to use
CloneFunctionAttributes/Metadata/Body separately.

CommonDebugInfo in CoroClone is now unused and is cleaned up separately
in the next diff in the stack.

Test Plan:
ninja check-all

stack-info: PR: https://github.com/llvm/llvm-project/pull/129149, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/7
---
 llvm/lib/Transforms/Coroutines/CoroSplit.cpp | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp 
b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index b2c4e64319725..fabbf5f020a74 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -921,14 +921,8 @@ void coro::BaseCloner::create() {
   auto savedLinkage = NewF->getLinkage();
   NewF->setLinkage(llvm::GlobalValue::ExternalLinkage);
 
-  MetadataPredicate IdentityMD = [&](const Metadata *MD) {
-return CommonDebugInfo.contains(MD);
-  };
-  CloneFunctionAttributesInto(NewF, &OrigF, VMap, false);
-  CloneFunctionMetadataInto(*NewF, OrigF, VMap, RF_None, nullptr, nullptr,
-&IdentityMD);
-  CloneFunctionBodyInto(*NewF, OrigF, VMap, RF_None, Returns, "", nullptr,
-nullptr, nullptr, &IdentityMD);
+  CloneFunctionInto(NewF, &OrigF, VMap,
+CloneFunctionChangeType::LocalChangesOnly, Returns);
 
   auto &Context = NewF->getContext();
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [flang][cuda] Convert cuf.shared_memory operation to LLVM ops (PR #131396)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Valentin Clement (バレンタイン クレメン) (clementval)


Changes

Convert the operation to `llvm.addressof` operation with `llvm.getelementptr` 
with the appropriate offset. 

---
Full diff: https://github.com/llvm/llvm-project/pull/131396.diff


3 Files Affected:

- (modified) flang/include/flang/Optimizer/Builder/CUFCommon.h (+1) 
- (modified) flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp (+69-1) 
- (added) flang/test/Fir/CUDA/cuda-shared-to-llvm.mlir (+20) 


``diff
diff --git a/flang/include/flang/Optimizer/Builder/CUFCommon.h 
b/flang/include/flang/Optimizer/Builder/CUFCommon.h
index e3c7b5098b83f..65b9cce1d2021 100644
--- a/flang/include/flang/Optimizer/Builder/CUFCommon.h
+++ b/flang/include/flang/Optimizer/Builder/CUFCommon.h
@@ -14,6 +14,7 @@
 #include "mlir/IR/BuiltinOps.h"
 
 static constexpr llvm::StringRef cudaDeviceModuleName = "cuda_device_mod";
+static constexpr llvm::StringRef cudaSharedMemSuffix = "__shared_mem";
 
 namespace fir {
 class FirOpBuilder;
diff --git a/flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp 
b/flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp
index 2a95e41944f3f..b54332b6694c4 100644
--- a/flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp
+++ b/flang/lib/Optimizer/Transforms/CUFGPUToLLVMConversion.cpp
@@ -7,12 +7,15 @@
 
//===--===//
 
 #include "flang/Optimizer/Transforms/CUFGPUToLLVMConversion.h"
+#include "flang/Optimizer/Builder/CUFCommon.h"
 #include "flang/Optimizer/CodeGen/TypeConverter.h"
+#include "flang/Optimizer/Dialect/CUF/CUFOps.h"
 #include "flang/Optimizer/Support/DataLayout.h"
 #include "flang/Runtime/CUDA/common.h"
 #include "flang/Support/Fortran.h"
 #include "mlir/Conversion/LLVMCommon/Pattern.h"
 #include "mlir/Dialect/GPU/IR/GPUDialect.h"
+#include "mlir/Dialect/LLVMIR/NVVMDialect.h"
 #include "mlir/Pass/Pass.h"
 #include "mlir/Transforms/DialectConversion.h"
 #include "mlir/Transforms/GreedyPatternRewriteDriver.h"
@@ -175,6 +178,69 @@ struct GPULaunchKernelConversion
   }
 };
 
+static std::string getFuncName(cuf::SharedMemoryOp op) {
+  if (auto gpuFuncOp = op->getParentOfType())
+return gpuFuncOp.getName().str();
+  if (auto funcOp = op->getParentOfType())
+return funcOp.getName().str();
+  if (auto llvmFuncOp = op->getParentOfType())
+return llvmFuncOp.getSymName().str();
+  return "";
+}
+
+static mlir::Value createAddressOfOp(mlir::ConversionPatternRewriter &rewriter,
+ mlir::Location loc,
+ gpu::GPUModuleOp gpuMod,
+ std::string &sharedGlobalName) {
+  auto llvmPtrTy = mlir::LLVM::LLVMPointerType::get(
+  rewriter.getContext(), mlir::NVVM::NVVMMemorySpace::kSharedMemorySpace);
+  if (auto g = gpuMod.lookupSymbol(sharedGlobalName))
+return rewriter.create(loc, llvmPtrTy,
+g.getSymName());
+  if (auto g = gpuMod.lookupSymbol(sharedGlobalName))
+return rewriter.create(loc, llvmPtrTy,
+g.getSymName());
+  return {};
+}
+
+struct CUFSharedMemoryOpConversion
+: public mlir::ConvertOpToLLVMPattern {
+  explicit CUFSharedMemoryOpConversion(
+  const fir::LLVMTypeConverter &typeConverter, mlir::PatternBenefit 
benefit)
+  : mlir::ConvertOpToLLVMPattern(typeConverter,
+  benefit) {}
+  using OpAdaptor = typename cuf::SharedMemoryOp::Adaptor;
+
+  mlir::LogicalResult
+  matchAndRewrite(cuf::SharedMemoryOp op, OpAdaptor adaptor,
+  mlir::ConversionPatternRewriter &rewriter) const override {
+mlir::Location loc = op->getLoc();
+if (!op.getOffset())
+  mlir::emitError(loc,
+  "cuf.shared_memory must have an offset for code gen");
+
+auto gpuMod = op->getParentOfType();
+std::string sharedGlobalName =
+(getFuncName(op) + llvm::Twine(cudaSharedMemSuffix)).str();
+mlir::Value sharedGlobalAddr =
+createAddressOfOp(rewriter, loc, gpuMod, sharedGlobalName);
+
+if (!sharedGlobalAddr)
+  mlir::emitError(loc, "Could not find the shared global operation\n");
+
+auto castPtr = rewriter.create(
+loc, mlir::LLVM::LLVMPointerType::get(rewriter.getContext()),
+sharedGlobalAddr);
+mlir::Type baseType = castPtr->getResultTypes().front();
+llvm::SmallVector gepArgs = {
+static_cast(*op.getOffset())};
+mlir::Value shmemPtr = rewriter.create(
+loc, baseType, rewriter.getI8Type(), castPtr, gepArgs);
+rewriter.replaceOp(op, {shmemPtr});
+return mlir::success();
+  }
+};
+
 class CUFGPUToLLVMConversion
 : public fir::impl::CUFGPUToLLVMConversionBase {
 public:
@@ -194,6 +260,7 @@ class CUFGPUToLLVMConversion
  /*forceUnifiedTBAATre

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> We can fold the clamp of the shift amount into the shift instruction during 
> selection as we know the instruction ignores the high bits. We do that in the 
> DAG path already. I think it special cases the and & (bitwidth - 1) pattern, 
> which should form canonically. In principle it could do a general simplify 
> demand bits

Where and how should that be implemented ? I struggled with that. I tried 
adding a new special case in TableGen but I just couldn't find the right way to 
do it.
Do I just add it in C++ InstructionSelector before it checks the patterns?
Or should it be some kind of post-processing step after the shift has been 
selected, but before the G_ZEXT is selected?


https://github.com/llvm/llvm-project/pull/131310
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Backport: [clang] fix matching of nested template template parameters (PR #130950)

2025-03-15 Thread Mike Lothian via llvm-branch-commits

FireBurn wrote:

Sorry my C++ templating skills aren't good enough to create a reduced example, 
it should be self contained to that repo - or at least enough to show the 
error, it shouldn't be using system libraries

https://github.com/llvm/llvm-project/pull/130950
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SIMemoryLegalizer to NPM (PR #130060)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan created 
https://github.com/llvm/llvm-project/pull/130060

None

>From 54641a84426002597f90f780e57575c4d31c6f58 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Wed, 5 Mar 2025 11:06:40 +
Subject: [PATCH] [AMDGPU][NPM] Port SIMemoryLegalizer to NPM

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  8 +++-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  2 +-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  5 ++-
 llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp  | 43 ++-
 4 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index f208a8bb9964b..23b9aa0cf0523 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -363,6 +363,12 @@ class GCNCreateVOPDPass : public 
PassInfoMixin {
   PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager 
&AM);
 };
 
+class SIMemoryLegalizerPass : public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+};
+
 FunctionPass *createAMDGPUAnnotateUniformValuesLegacy();
 
 ModulePass *createAMDGPUPrintfRuntimeBinding();
@@ -427,7 +433,7 @@ class SIAnnotateControlFlowPass
 void initializeSIAnnotateControlFlowLegacyPass(PassRegistry &);
 extern char &SIAnnotateControlFlowLegacyPassID;
 
-void initializeSIMemoryLegalizerPass(PassRegistry&);
+void initializeSIMemoryLegalizerLegacyPass(PassRegistry &);
 extern char &SIMemoryLegalizerID;
 
 void initializeSIModeRegisterLegacyPass(PassRegistry &);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 0e3dcb4267ede..de959f8a2aa62 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -113,6 +113,7 @@ MACHINE_FUNCTION_PASS("si-load-store-opt", 
SILoadStoreOptimizerPass())
 MACHINE_FUNCTION_PASS("si-lower-control-flow", SILowerControlFlowPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-lower-wwm-copies", SILowerWWMCopiesPass())
+MACHINE_FUNCTION_PASS("si-memory-legalizer", SIMemoryLegalizerPass())
 MACHINE_FUNCTION_PASS("si-mode-register", SIModeRegisterPass())
 MACHINE_FUNCTION_PASS("si-opt-vgpr-liverange", SIOptimizeVGPRLiveRangePass())
 MACHINE_FUNCTION_PASS("si-optimize-exec-masking", SIOptimizeExecMaskingPass())
@@ -132,7 +133,6 @@ DUMMY_MACHINE_FUNCTION_PASS("amdgpu-set-wave-priority", 
AMDGPUSetWavePriorityPas
 DUMMY_MACHINE_FUNCTION_PASS("si-insert-hard-clauses", 
SIInsertHardClausesPass())
 DUMMY_MACHINE_FUNCTION_PASS("si-insert-waitcnts", SIInsertWaitcntsPass())
 DUMMY_MACHINE_FUNCTION_PASS("si-late-branch-lowering", 
SILateBranchLoweringPass())
-DUMMY_MACHINE_FUNCTION_PASS("si-memory-legalizer", SIMemoryLegalizerPass())
 DUMMY_MACHINE_FUNCTION_PASS("si-pre-emit-peephole", SIPreEmitPeepholePass())
 // TODO: Move amdgpu-preload-kern-arg-prolog to MACHINE_FUNCTION_PASS since it
 // already exists.
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 73ae9135eb319..dbe212ad0a216 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -541,7 +541,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSILowerControlFlowLegacyPass(*PR);
   initializeSIPreEmitPeepholePass(*PR);
   initializeSILateBranchLoweringPass(*PR);
-  initializeSIMemoryLegalizerPass(*PR);
+  initializeSIMemoryLegalizerLegacyPass(*PR);
   initializeSIOptimizeExecMaskingLegacyPass(*PR);
   initializeSIPreAllocateWWMRegsLegacyPass(*PR);
   initializeSIFormMemoryClausesLegacyPass(*PR);
@@ -2151,7 +2151,8 @@ void 
AMDGPUCodeGenPassBuilder::addPreEmitPass(AddMachinePass &addPass) const {
   if (isPassEnabled(EnableVOPD, CodeGenOptLevel::Less)) {
 addPass(GCNCreateVOPDPass());
   }
-  // TODO: addPass(SIMemoryLegalizerPass());
+
+  addPass(SIMemoryLegalizerPass());
   // TODO: addPass(SIInsertWaitcntsPass());
 
   // TODO: addPass(SIModeRegisterPass());
diff --git a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp 
b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
index 34953f9c08db7..1375ba201ec58 100644
--- a/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMemoryLegalizer.cpp
@@ -21,8 +21,10 @@
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachinePassManager.h"
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/MemoryModelRelaxationAnnotations.h"
+#include "llvm/IR/PassManager.h"
 #include "llvm/Support/AtomicOrdering.h"
 #include "llvm/TargetParser/TargetParser.h"
 
@@ -625,9 +627,9 @@ class SIGfx12CacheControl : public SIGfx11CacheControl {
   }
 };
 
-class SIMemoryLegalizer final : public MachineFunctionPass {
+class SIMemoryLegalizer final 

[llvm-branch-commits] [clang][CallGraphSection] Type id metadata for indirect calls (PR #117036)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117036


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][Cloning] Remove now unused FindDebugInfoToIdentityMap (PR #129151)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129151

>From 8da28defead71eac40108c41aa218c3adf5f3dd6 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 13:00:47 -0800
Subject: [PATCH] [NFC][Cloning] Remove now unused FindDebugInfoToIdentityMap

Summary:
This function is no longer needed.

Test Plan:
ninja check-llvm-unit

stack-info: PR: https://github.com/llvm/llvm-project/pull/129151, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/9
---
 llvm/include/llvm/Transforms/Utils/Cloning.h | 19 ---
 llvm/lib/Transforms/Utils/CloneFunction.cpp  | 34 
 2 files changed, 53 deletions(-)

diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h 
b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 2252dda0b9aad..ae00c16e7eada 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -244,25 +244,6 @@ DISubprogram *CollectDebugInfoForCloning(const Function &F,
  CloneFunctionChangeType Changes,
  DebugInfoFinder &DIFinder);
 
-/// Based on \p Changes and \p DIFinder return debug info that needs to be
-/// identity mapped during Metadata cloning.
-///
-/// NOTE: Such \a MetadataSetTy can be used by \a CloneFunction* to directly
-/// specify metadata that should be identity mapped (and hence not cloned). The
-/// metadata will be identity mapped in \a ValueToValueMapTy on first use. 
There
-/// are several reasons for doing it this way rather than eagerly identity
-/// mapping metadata nodes in a \a ValueMap:
-/// 1. Mapping metadata is not cheap, particularly because of tracking.
-/// 2. When cloning a Function we identity map lots of global module-level
-///metadata to avoid cloning it, while only a fraction of it is actually
-///used by the function. Mapping on first use is a lot faster for modules
-///with meaningful amount of debug info.
-/// 3. Eagerly identity mapping metadata makes it harder to cache module-level
-///data (e.g. a set of metadata nodes in a \a DICompileUnit).
-MetadataSetTy FindDebugInfoToIdentityMap(CloneFunctionChangeType Changes,
- DebugInfoFinder &DIFinder,
- DISubprogram *SPClonedWithinModule);
-
 /// This class captures the data input to the InlineFunction call, and records
 /// the auxiliary results produced by it.
 class InlineFunctionInfo {
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index bdabe0e562fc9..249bef4696b8a 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -189,40 +189,6 @@ DISubprogram *llvm::CollectDebugInfoForCloning(const 
Function &F,
   return SPClonedWithinModule;
 }
 
-MetadataSetTy
-llvm::FindDebugInfoToIdentityMap(CloneFunctionChangeType Changes,
- DebugInfoFinder &DIFinder,
- DISubprogram *SPClonedWithinModule) {
-  if (Changes >= CloneFunctionChangeType::DifferentModule)
-return {};
-
-  if (DIFinder.subprogram_count() == 0)
-assert(!SPClonedWithinModule &&
-   "Subprogram should be in DIFinder->subprogram_count()...");
-
-  MetadataSetTy MD;
-
-  // Avoid cloning types, compile units, and (other) subprograms.
-  for (DISubprogram *ISP : DIFinder.subprograms())
-if (ISP != SPClonedWithinModule)
-  MD.insert(ISP);
-
-  // If a subprogram isn't going to be cloned skip its lexical blocks as well.
-  for (DIScope *S : DIFinder.scopes()) {
-auto *LScope = dyn_cast(S);
-if (LScope && LScope->getSubprogram() != SPClonedWithinModule)
-  MD.insert(S);
-  }
-
-for (DICompileUnit *CU : DIFinder.compile_units())
-  MD.insert(CU);
-
-for (DIType *Type : DIFinder.types())
-  MD.insert(Type);
-
-  return MD;
-}
-
 void llvm::CloneFunctionMetadataInto(Function &NewFunc, const Function 
&OldFunc,
  ValueToValueMapTy &VMap,
  RemapFlags RemapFlag,

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][Cloning] Move DebugInfoFinder decl closer to its place of usage (PR #129154)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129154

>From 80ad60fc01005ad4c8331bb97e9848f0ae7c9341 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 13:09:23 -0800
Subject: [PATCH] [NFC][Cloning] Move DebugInfoFinder decl closer to its place
 of usage

Summary:
This makes it clear that DIFinder is only really necessary for llvm.dbg.cu 
update.

Test Plan:
ninja check-llvm-unit

stack-info: PR: https://github.com/llvm/llvm-project/pull/129154, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/12
---
 llvm/lib/Transforms/Utils/CloneFunction.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index cde1ce8b43dbd..b411d4cb87fd4 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -266,8 +266,6 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
   if (OldFunc->isDeclaration())
 return;
 
-  DebugInfoFinder DIFinder;
-
   if (Changes < CloneFunctionChangeType::DifferentModule) {
 assert((NewFunc->getParent() == nullptr ||
 NewFunc->getParent() == OldFunc->getParent()) &&
@@ -320,7 +318,8 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
 Visited.insert(Operand);
 
   // Collect and clone all the compile units referenced from the instructions 
in
-  // the function (e.g. as a scope).
+  // the function (e.g. as instructions' scope).
+  DebugInfoFinder DIFinder;
   collectDebugInfoFromInstructions(*OldFunc, DIFinder);
   for (auto *Unit : DIFinder.compile_units()) {
 MDNode *MappedUnit =

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port XRayInstrumentation to NPM (PR #129865)

2025-03-15 Thread Akshat Oke via llvm-branch-commits


@@ -143,11 +170,43 @@ void XRayInstrumentation::prependRetWithPatchableExit(
 }
 }
 
-bool XRayInstrumentation::runOnMachineFunction(MachineFunction &MF) {
+PreservedAnalyses
+XRayInstrumentationPass::run(MachineFunction &MF,
+ MachineFunctionAnalysisManager &MFAM) {
+  MachineDominatorTree *MDT = nullptr;
+  MachineLoopInfo *MLI = nullptr;
+
+  if (XRayInstrumentation::needMDTAndMLIAnalyses(MF.getFunction())) {
+MDT = MFAM.getCachedResult(MF);
+MLI = MFAM.getCachedResult(MF);

optimisan wrote:

Legacy pass is calling `getAnalysisIfAvailable` on these, so I used 
`getCachedResult`

https://github.com/llvm/llvm-project/pull/129865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm][AsmPrinter] Emit call graph section (PR #87576)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87576


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [SeparateConstOffsetFromGEP] Preserve inbounds flag based on ValueTracking (PR #130617)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits


@@ -1079,6 +1081,8 @@ bool 
SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
 // and the old index if they are not used.
 RecursivelyDeleteTriviallyDeadInstructions(UserChainTail);
 RecursivelyDeleteTriviallyDeadInstructions(OldIdx);
+MayRecoverInbounds =
+MayRecoverInbounds && computeKnownBits(NewIdx, 
*DL).isNonNegative();

arsenm wrote:

Needs some negative tests where computeKnownBits missed by a bit 

https://github.com/llvm/llvm-project/pull/130617
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [llvm] Introduce callee_type operand bundle (PR #87573)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits


@@ -105,4 +105,17 @@ declare ptr @objc_retainAutoreleasedReturnValue(ptr)
 declare ptr @objc_unsafeClaimAutoreleasedReturnValue(ptr)
 declare void @llvm.assume(i1)
 
+define void @f_type(ptr %ptr) {
+; CHECK: Multiple "callee_type" operand bundles
+; CHECK-NEXT: call void @g() [ "callee_type"(metadata !"_ZTSFvE.generalized"), 
"callee_type"(metadata !"_ZTSFvE.generalized") ]
+; CHECK-NOT: call void @g() [ "callee_type"(metadata !"_ZTSFvE.generalized") ]

arsenm wrote:

-NOT checks are too fragile, use comprehensive -NEXT checks 

https://github.com/llvm/llvm-project/pull/87573
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [ValueTracking] Skip incoming values that are the same as the phi in `isGuaranteedNotToBeUndefOrPoison` (#130111) (PR #130474)

2025-03-15 Thread Nikita Popov via llvm-branch-commits

https://github.com/nikic approved this pull request.


https://github.com/llvm/llvm-project/pull/130474
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port RemoveLoadsIntoFakeUses to NPM (PR #130068)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/130068

>From 49cfcf28f0fce75df19c3a01520aa17ca6825847 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Thu, 6 Mar 2025 09:30:37 +
Subject: [PATCH] [CodeGen][NPM] Port RemoveLoadsIntoFakeUses to NPM

---
 .../llvm/CodeGen/RemoveLoadsIntoFakeUses.h| 30 +
 llvm/include/llvm/InitializePasses.h  |  2 +-
 llvm/include/llvm/Passes/CodeGenPassBuilder.h |  2 +
 .../llvm/Passes/MachinePassRegistry.def   |  2 +-
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/RemoveLoadsIntoFakeUses.cpp  | 44 +++
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 .../CodeGen/X86/fake-use-remove-loads.mir |  2 +
 8 files changed, 73 insertions(+), 12 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/RemoveLoadsIntoFakeUses.h

diff --git a/llvm/include/llvm/CodeGen/RemoveLoadsIntoFakeUses.h 
b/llvm/include/llvm/CodeGen/RemoveLoadsIntoFakeUses.h
new file mode 100644
index 0..bbd5b8b430bf6
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/RemoveLoadsIntoFakeUses.h
@@ -0,0 +1,30 @@
+//===- llvm/CodeGen/RemoveLoadsIntoFakeUses.h ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_REMOVELOADSINTOFAKEUSES_H
+#define LLVM_CODEGEN_REMOVELOADSINTOFAKEUSES_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class RemoveLoadsIntoFakeUsesPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+
+  MachineFunctionProperties getRequiredProperties() const {
+return MachineFunctionProperties().set(
+MachineFunctionProperties::Property::NoVRegs);
+  }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_REMOVELOADSINTOFAKEUSES_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index e5bffde815117..3fd3cbb28bc3e 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -265,7 +265,7 @@ void initializeRegionOnlyViewerPass(PassRegistry &);
 void initializeRegionPrinterPass(PassRegistry &);
 void initializeRegionViewerPass(PassRegistry &);
 void initializeRegisterCoalescerLegacyPass(PassRegistry &);
-void initializeRemoveLoadsIntoFakeUsesPass(PassRegistry &);
+void initializeRemoveLoadsIntoFakeUsesLegacyPass(PassRegistry &);
 void initializeRemoveRedundantDebugValuesLegacyPass(PassRegistry &);
 void initializeRenameIndependentSubregsLegacyPass(PassRegistry &);
 void initializeReplaceWithVeclibLegacyPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index aab2c58ac0f78..a86dc8d632a4e 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -70,6 +70,7 @@
 #include "llvm/CodeGen/RegUsageInfoPropagate.h"
 #include "llvm/CodeGen/RegisterCoalescerPass.h"
 #include "llvm/CodeGen/RegisterUsageInfo.h"
+#include "llvm/CodeGen/RemoveLoadsIntoFakeUses.h"
 #include "llvm/CodeGen/RemoveRedundantDebugValues.h"
 #include "llvm/CodeGen/RenameIndependentSubregs.h"
 #include "llvm/CodeGen/ReplaceWithVeclib.h"
@@ -998,6 +999,7 @@ Error CodeGenPassBuilder::addMachinePasses(
 
   addPass(FuncletLayoutPass());
 
+  addPass(RemoveLoadsIntoFakeUsesPass());
   addPass(StackMapLivenessPass());
   addPass(LiveDebugValuesPass());
   addPass(MachineSanitizerBinaryMetadata());
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index 9300f6935aa90..cab8108ed30f6 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -181,6 +181,7 @@ MACHINE_FUNCTION_PASS("reg-usage-collector", 
RegUsageInfoCollectorPass())
 MACHINE_FUNCTION_PASS("reg-usage-propagation", RegUsageInfoPropagationPass())
 MACHINE_FUNCTION_PASS("register-coalescer", RegisterCoalescerPass())
 MACHINE_FUNCTION_PASS("rename-independent-subregs", 
RenameIndependentSubregsPass())
+MACHINE_FUNCTION_PASS("remove-loads-into-fake-uses", 
RemoveLoadsIntoFakeUsesPass())
 MACHINE_FUNCTION_PASS("remove-redundant-debug-values", 
RemoveRedundantDebugValuesPass())
 MACHINE_FUNCTION_PASS("require-all-machine-function-properties",
   RequireAllMachineFunctionPropertiesPass())
@@ -292,7 +293,6 @@ DUMMY_MACHINE_FUNCTION_PASS("ra-pbqp", RAPBQPPass)
 DUMMY_MACHINE_FUNCTION_PASS("regalloc", RegAllocPass)
 DUMMY_MACHINE_FUNCTION_PASS("regallocscoringpass", RegAllocScoringPass)
 DUMMY_MACHINE_FUNCTION_PASS("regbankselect", RegBankSelectPass)
-DUMMY_MACHINE_FUNCTION_PASS("remove-loads-into-fa

[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/131310.diff


1 Files Affected:

- (added) llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir (+429) 


``diff
diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHR_B32 %src, %shiftmask, implicit-d

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SIPreEmitPeephole to NPM (PR #130065)

2025-03-15 Thread Christudasan Devadasan via llvm-branch-commits


@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass si-pre-emit-peephole 
-verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s 
-implicit-check-not=S_SET_GPR_IDX
+# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole 
-verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s 
-implicit-check-not=S_SET_GPR_IDX

cdevadas wrote:

```suggestion
# RUN: llc -mtriple=amdgcn -mcpu=gfx900 -passes si-pre-emit-peephole -o - %s | 
FileCheck -check-prefix=GCN %s -implicit-check-not=S_SET_GPR_IDX
```

https://github.com/llvm/llvm-project/pull/130065
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port BranchRelaxation to NPM (PR #130067)

2025-03-15 Thread Christudasan Devadasan via llvm-branch-commits


@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
 # RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a 
--amdgpu-s-branch-bits=5 -run-pass branch-relaxation %s -o - | FileCheck %s
+# RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a 
--amdgpu-s-branch-bits=5 -passes=branch-relaxation %s -o - | FileCheck %s

cdevadas wrote:

```suggestion
# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a --amdgpu-s-branch-bits=5 
-passes=branch-relaxation %s -o - | FileCheck %s
```

https://github.com/llvm/llvm-project/pull/130067
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Switch test to generated checks (PR #131315)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/131315

>From 1b2648aa5b6f91032e35d53888fa521046c385fd Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 14 Mar 2025 17:04:33 +0700
Subject: [PATCH] AMDGPU: Switch test to generated checks

I doubt this is testing what it originally intended anymore. Also
replace an undef.
---
 llvm/test/CodeGen/AMDGPU/subreg-eliminate-dead.ll | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/subreg-eliminate-dead.ll 
b/llvm/test/CodeGen/AMDGPU/subreg-eliminate-dead.ll
index 59ad6f6139ace..2efcb411cf0ee 100644
--- a/llvm/test/CodeGen/AMDGPU/subreg-eliminate-dead.ll
+++ b/llvm/test/CodeGen/AMDGPU/subreg-eliminate-dead.ll
@@ -1,16 +1,19 @@
-; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs -o - %s | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn-- -mcpu=tahiti < %s | FileCheck %s
 ; LiveRangeEdit::eliminateDeadDef did not update LiveInterval sub ranges
 ; properly.
 
 ; Just make sure this test doesn't crash.
-; CHECK-LABEL: foobar:
-; CHECK: s_endpgm
 define amdgpu_kernel void @foobar() {
+; CHECK-LABEL: foobar:
+; CHECK:   ; %bb.0:
+; CHECK-NEXT:s_endpgm
   %v0 = icmp eq <4 x i32> poison, 
   %v3 = sext <4 x i1> %v0 to <4 x i32>
   %v4 = extractelement <4 x i32> %v3, i32 1
   %v5 = icmp ne i32 %v4, 0
-  %v6 = select i1 %v5, i32 undef, i32 0
+  %undef = freeze i32 poison
+  %v6 = select i1 %v5, i32 %undef, i32 0
   %v15 = insertelement <2 x i32> poison, i32 %v6, i32 1
   store <2 x i32> %v15, ptr addrspace(1) poison, align 8
   ret void

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [Clang] Do not emit nodiscard warnings for the base expr of static member access (#131450) (PR #131474)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport 9a1e390

Requested by: @cor3ntin

---
Full diff: https://github.com/llvm/llvm-project/pull/131474.diff


5 Files Affected:

- (modified) clang/include/clang/Sema/Sema.h (-5) 
- (modified) clang/lib/Sema/SemaExprMember.cpp (-1) 
- (modified) clang/lib/Sema/SemaStmt.cpp (-4) 
- (modified) clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp (+6-4) 
- (modified) clang/test/SemaCXX/ms-property.cpp (+1-1) 


``diff
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index a30a7076ea5d4..6e2e5aaff2347 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -10671,11 +10671,6 @@ class Sema final : public SemaBase {
SourceLocation EndLoc);
   void ActOnForEachDeclStmt(DeclGroupPtrTy Decl);
 
-  /// DiagnoseDiscardedExprMarkedNodiscard - Given an expression that is
-  /// semantically a discarded-value expression, diagnose if any [[nodiscard]]
-  /// value has been discarded.
-  void DiagnoseDiscardedExprMarkedNodiscard(const Expr *E);
-
   /// DiagnoseUnusedExprResult - If the statement passed in is an expression
   /// whose result is unused, warn.
   void DiagnoseUnusedExprResult(const Stmt *S, unsigned DiagID);
diff --git a/clang/lib/Sema/SemaExprMember.cpp 
b/clang/lib/Sema/SemaExprMember.cpp
index d130e8b86bc56..adb8e3cc90c0c 100644
--- a/clang/lib/Sema/SemaExprMember.cpp
+++ b/clang/lib/Sema/SemaExprMember.cpp
@@ -1136,7 +1136,6 @@ Sema::BuildMemberReferenceExpr(Expr *BaseExpr, QualType 
BaseExprType,
 if (Converted.isInvalid())
   return true;
 BaseExpr = Converted.get();
-DiagnoseDiscardedExprMarkedNodiscard(BaseExpr);
 return false;
   };
   auto ConvertBaseExprToGLValue = [&] {
diff --git a/clang/lib/Sema/SemaStmt.cpp b/clang/lib/Sema/SemaStmt.cpp
index 947651d514b3b..b8b59793d6508 100644
--- a/clang/lib/Sema/SemaStmt.cpp
+++ b/clang/lib/Sema/SemaStmt.cpp
@@ -413,10 +413,6 @@ void DiagnoseUnused(Sema &S, const Expr *E, 
std::optional DiagID) {
 }
 } // namespace
 
-void Sema::DiagnoseDiscardedExprMarkedNodiscard(const Expr *E) {
-  DiagnoseUnused(*this, E, std::nullopt);
-}
-
 void Sema::DiagnoseUnusedExprResult(const Stmt *S, unsigned DiagID) {
   if (const LabelStmt *Label = dyn_cast_if_present(S))
 S = Label->getSubStmt();
diff --git a/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp 
b/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
index 18f4bd5e9c0fa..0012ab976baa5 100644
--- a/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
+++ b/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
@@ -164,19 +164,21 @@ struct X {
 
 [[nodiscard]] X get_X();
 // cxx11-warning@-1 {{use of the 'nodiscard' attribute is a C++17 extension}}
+[[nodiscard]] X* get_Ptr();
+// cxx11-warning@-1 {{use of the 'nodiscard' attribute is a C++17 extension}}
 void f() {
+  get_X(); // expected-warning{{ignoring return value of function declared 
with 'nodiscard' attribute}}
+  (void) get_X();
   (void) get_X().variant_member;
   (void) get_X().anonymous_struct_member;
   (void) get_X().data_member;
   (void) get_X().static_data_member;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().unscoped_enum;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().scoped_enum;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().implicit_object_member_function();
   (void) get_X().static_member_function();
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
+  (void) get_Ptr()->implicit_object_member_function();
+  (void) get_Ptr()->static_member_function();
 #if __cplusplus >= 202302L
   (void) get_X().explicit_object_member_function();
 #endif
diff --git a/clang/test/SemaCXX/ms-property.cpp 
b/clang/test/SemaCXX/ms-property.cpp
index d5799a8a4d363..f1424b9cb12bc 100644
--- a/clang/test/SemaCXX/ms-property.cpp
+++ b/clang/test/SemaCXX/ms-property.cpp
@@ -2,6 +2,7 @@
 // RUN: %clang_cc1 -triple=x86_64-pc-win32 -fms-compatibility -emit-pch -o %t 
-verify %s
 // RUN: %clang_cc1 -triple=x86_64-pc-win32 -fms-compatibility -include-pch %t 
%s -ast-print -o - | FileCheck %s
 // RUN: %clang_cc1 -fdeclspec -fsyntax-only -verify %s -std=c++23
+// expected-no-diagnostics
 
 #ifndef HEADER
 #define HEADER
@@ -103,7 +104,6 @@ struct X {
 void f() {
   (void) get_x().imp;
   (void) get_x().st;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
 #if __cplusplus >= 202302L
   (void) get_x().exp;
 #endif

``




https://github.com/llvm/llvm-project/pull/131474
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/li

[llvm-branch-commits] [clang] release/20.x: [Clang] Do not emit nodiscard warnings for the base expr of static member access (#131450) (PR #131474)

2025-03-15 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/131474

Backport 9a1e390

Requested by: @cor3ntin

>From e46c31e5a5d2aae2fcfc8d835681fcb58ea4c505 Mon Sep 17 00:00:00 2001
From: cor3ntin 
Date: Sat, 15 Mar 2025 22:27:08 +0100
Subject: [PATCH] [Clang] Do not emit nodiscard warnings for the base expr of
 static member access (#131450)

For an expression `nodiscard_function().static_member(), the nodiscard
warnings added by #120223, are not useful or actionable, and are
disruptive to some library implementations; we just remove them.

Fixes #131410

(cherry picked from commit 9a1e39062b2ab445f1f4424ecdc5ffb46e8cb9e0)
---
 clang/include/clang/Sema/Sema.h|  5 -
 clang/lib/Sema/SemaExprMember.cpp  |  1 -
 clang/lib/Sema/SemaStmt.cpp|  4 
 .../CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp | 10 ++
 clang/test/SemaCXX/ms-property.cpp |  2 +-
 5 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index a30a7076ea5d4..6e2e5aaff2347 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -10671,11 +10671,6 @@ class Sema final : public SemaBase {
SourceLocation EndLoc);
   void ActOnForEachDeclStmt(DeclGroupPtrTy Decl);
 
-  /// DiagnoseDiscardedExprMarkedNodiscard - Given an expression that is
-  /// semantically a discarded-value expression, diagnose if any [[nodiscard]]
-  /// value has been discarded.
-  void DiagnoseDiscardedExprMarkedNodiscard(const Expr *E);
-
   /// DiagnoseUnusedExprResult - If the statement passed in is an expression
   /// whose result is unused, warn.
   void DiagnoseUnusedExprResult(const Stmt *S, unsigned DiagID);
diff --git a/clang/lib/Sema/SemaExprMember.cpp 
b/clang/lib/Sema/SemaExprMember.cpp
index d130e8b86bc56..adb8e3cc90c0c 100644
--- a/clang/lib/Sema/SemaExprMember.cpp
+++ b/clang/lib/Sema/SemaExprMember.cpp
@@ -1136,7 +1136,6 @@ Sema::BuildMemberReferenceExpr(Expr *BaseExpr, QualType 
BaseExprType,
 if (Converted.isInvalid())
   return true;
 BaseExpr = Converted.get();
-DiagnoseDiscardedExprMarkedNodiscard(BaseExpr);
 return false;
   };
   auto ConvertBaseExprToGLValue = [&] {
diff --git a/clang/lib/Sema/SemaStmt.cpp b/clang/lib/Sema/SemaStmt.cpp
index 947651d514b3b..b8b59793d6508 100644
--- a/clang/lib/Sema/SemaStmt.cpp
+++ b/clang/lib/Sema/SemaStmt.cpp
@@ -413,10 +413,6 @@ void DiagnoseUnused(Sema &S, const Expr *E, 
std::optional DiagID) {
 }
 } // namespace
 
-void Sema::DiagnoseDiscardedExprMarkedNodiscard(const Expr *E) {
-  DiagnoseUnused(*this, E, std::nullopt);
-}
-
 void Sema::DiagnoseUnusedExprResult(const Stmt *S, unsigned DiagID) {
   if (const LabelStmt *Label = dyn_cast_if_present(S))
 S = Label->getSubStmt();
diff --git a/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp 
b/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
index 18f4bd5e9c0fa..0012ab976baa5 100644
--- a/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
+++ b/clang/test/CXX/dcl.dcl/dcl.attr/dcl.attr.nodiscard/p2.cpp
@@ -164,19 +164,21 @@ struct X {
 
 [[nodiscard]] X get_X();
 // cxx11-warning@-1 {{use of the 'nodiscard' attribute is a C++17 extension}}
+[[nodiscard]] X* get_Ptr();
+// cxx11-warning@-1 {{use of the 'nodiscard' attribute is a C++17 extension}}
 void f() {
+  get_X(); // expected-warning{{ignoring return value of function declared 
with 'nodiscard' attribute}}
+  (void) get_X();
   (void) get_X().variant_member;
   (void) get_X().anonymous_struct_member;
   (void) get_X().data_member;
   (void) get_X().static_data_member;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().unscoped_enum;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().scoped_enum;
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
   (void) get_X().implicit_object_member_function();
   (void) get_X().static_member_function();
-  // expected-warning@-1 {{ignoring return value of function declared with 
'nodiscard' attribute}}
+  (void) get_Ptr()->implicit_object_member_function();
+  (void) get_Ptr()->static_member_function();
 #if __cplusplus >= 202302L
   (void) get_X().explicit_object_member_function();
 #endif
diff --git a/clang/test/SemaCXX/ms-property.cpp 
b/clang/test/SemaCXX/ms-property.cpp
index d5799a8a4d363..f1424b9cb12bc 100644
--- a/clang/test/SemaCXX/ms-property.cpp
+++ b/clang/test/SemaCXX/ms-property.cpp
@@ -2,6 +2,7 @@
 // RUN: %clang_cc1 -triple=x86_64-pc-win32 -fms-compatibility -emit-pch -o %t 
-verify %s
 // RUN: %clang_cc1 -triple=x86_64-pc-win32 -fms-compatibility -include-pch %t 
%s -ast-print -o - | FileCheck %s
 // RUN: %clang_cc1 -fdeclspec -fsyntax-only -verify %s -std=c++23
+// expect

[llvm-branch-commits] [clang] release/20.x: [Clang] Do not emit nodiscard warnings for the base expr of static member access (#131450) (PR #131474)

2025-03-15 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/131474
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [Clang] Do not emit nodiscard warnings for the base expr of static member access (#131450) (PR #131474)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:

@shafik What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/131474
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [HLSL][NFC] Use builtin method builder to create default resource constructor (PR #131384)

2025-03-15 Thread Helena Kotas via llvm-branch-commits

https://github.com/hekota edited 
https://github.com/llvm/llvm-project/pull/131384
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang-tools-extra] [clang-tidy] support pointee mutation check in misc-const-correctness (PR #130494)

2025-03-15 Thread via llvm-branch-commits


@@ -125,6 +132,22 @@ Options
 // No warning
 const int *const pointer_variable = &value;
 
+.. option:: WarnPointersAsPointers
+
+  This option enables the suggestion for ``const`` of the value pointing.
+  Default is `true`.
+
+  Requires 'AnalyzePointers' to be 'true'.

EugeneZelenko wrote:

```suggestion
  Requires :option:`AnalyzePointers` to be `true`.
```

https://github.com/llvm/llvm-project/pull/130494
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/20.x: [clang] Reject constexpr-unknown values as constant expressions more consistently (PR #130658)

2025-03-15 Thread Shafik Yaghmour via llvm-branch-commits

https://github.com/shafik commented:

Thank you for backporting these changes!

https://github.com/llvm/llvm-project/pull/130658
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port PostRAHazardRecognizer to NPM (PR #130066)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/130066

>From 3060681f19ebf8d1ff61f54c7efd679ef6fbb817 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Thu, 6 Mar 2025 06:42:54 +
Subject: [PATCH] [CodeGen][NPM] Port PostRAHazardRecognizer to NPM

---
 .../llvm/CodeGen/PostRAHazardRecognizer.h | 26 +++
 llvm/include/llvm/InitializePasses.h  |  2 +-
 .../llvm/Passes/MachinePassRegistry.def   |  1 +
 llvm/lib/CodeGen/CodeGen.cpp  |  2 +-
 llvm/lib/CodeGen/PostRAHazardRecognizer.cpp   | 46 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  3 +-
 .../AMDGPU/break-smem-soft-clauses.mir|  2 +
 llvm/test/CodeGen/AMDGPU/dst-sel-hazard.mir   |  2 +
 .../hazard-flat-instruction-valu-check.mir|  1 +
 10 files changed, 68 insertions(+), 18 deletions(-)
 create mode 100644 llvm/include/llvm/CodeGen/PostRAHazardRecognizer.h

diff --git a/llvm/include/llvm/CodeGen/PostRAHazardRecognizer.h 
b/llvm/include/llvm/CodeGen/PostRAHazardRecognizer.h
new file mode 100644
index 0..3e0c04ac5e403
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/PostRAHazardRecognizer.h
@@ -0,0 +1,26 @@
+//===- llvm/CodeGen/PostRAHazardRecognizer.h *- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#ifndef LLVM_CODEGEN_POSTRAHAZARDRECOGNIZER_H
+#define LLVM_CODEGEN_POSTRAHAZARDRECOGNIZER_H
+
+#include "llvm/CodeGen/MachinePassManager.h"
+
+namespace llvm {
+
+class PostRAHazardRecognizerPass
+: public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF,
+MachineFunctionAnalysisManager &MFAM);
+  static bool isRequired() { return true; }
+};
+
+} // namespace llvm
+
+#endif // LLVM_CODEGEN_POSTRAHAZARDRECOGNIZER_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index f1c16e3b1cb40..a3fd97ee99f3b 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -237,7 +237,7 @@ void initializePostDomViewerWrapperPassPass(PassRegistry &);
 void initializePostDominatorTreeWrapperPassPass(PassRegistry &);
 void initializePostInlineEntryExitInstrumenterPass(PassRegistry &);
 void initializePostMachineSchedulerLegacyPass(PassRegistry &);
-void initializePostRAHazardRecognizerPass(PassRegistry &);
+void initializePostRAHazardRecognizerLegacyPass(PassRegistry &);
 void initializePostRAMachineSinkingPass(PassRegistry &);
 void initializePostRASchedulerLegacyPass(PassRegistry &);
 void initializePreISelIntrinsicLoweringLegacyPassPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index bedbc3e88a7ce..285ad9601c6ff 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -155,6 +155,7 @@ MACHINE_FUNCTION_PASS("opt-phis", OptimizePHIsPass())
 MACHINE_FUNCTION_PASS("patchable-function", PatchableFunctionPass())
 MACHINE_FUNCTION_PASS("peephole-opt", PeepholeOptimizerPass())
 MACHINE_FUNCTION_PASS("phi-node-elimination", PHIEliminationPass())
+MACHINE_FUNCTION_PASS("post-RA-hazard-rec", PostRAHazardRecognizerPass())
 MACHINE_FUNCTION_PASS("post-RA-sched", PostRASchedulerPass(TM))
 MACHINE_FUNCTION_PASS("postmisched", PostMachineSchedulerPass(TM))
 MACHINE_FUNCTION_PASS("post-ra-pseudos", ExpandPostRAPseudosPass())
diff --git a/llvm/lib/CodeGen/CodeGen.cpp b/llvm/lib/CodeGen/CodeGen.cpp
index 375176ed4b1ce..69b4d8bac94cf 100644
--- a/llvm/lib/CodeGen/CodeGen.cpp
+++ b/llvm/lib/CodeGen/CodeGen.cpp
@@ -106,7 +106,7 @@ void llvm::initializeCodeGen(PassRegistry &Registry) {
   initializePatchableFunctionLegacyPass(Registry);
   initializePeepholeOptimizerLegacyPass(Registry);
   initializePostMachineSchedulerLegacyPass(Registry);
-  initializePostRAHazardRecognizerPass(Registry);
+  initializePostRAHazardRecognizerLegacyPass(Registry);
   initializePostRAMachineSinkingPass(Registry);
   initializePostRASchedulerLegacyPass(Registry);
   initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
diff --git a/llvm/lib/CodeGen/PostRAHazardRecognizer.cpp 
b/llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
index 97b1532300b17..3ead2087fc1d9 100644
--- a/llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
+++ b/llvm/lib/CodeGen/PostRAHazardRecognizer.cpp
@@ -26,6 +26,7 @@
 //
 
//===--===//
 
+#include "llvm/CodeGen/PostRAHazardRecognizer.h"
 #include "llvm/ADT/Statistic.h"
 #include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/ScheduleHazardRecognizer.h"
@@ -40,30 +41,45 @@ using namespace llvm;
 STA

[llvm-branch-commits] [llvm] [AMDGPU][NFC] Format GCNCreateVOPD.cpp (PR #130548)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Akshat Oke (optimisan)


Changes



---
Full diff: https://github.com/llvm/llvm-project/pull/130548.diff


1 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp (+9-9) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index d40a1a2a10d9b..798279b279da3 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -38,15 +38,15 @@ namespace {
 
 class GCNCreateVOPD : public MachineFunctionPass {
 private:
-class VOPDCombineInfo {
-public:
-  VOPDCombineInfo() = default;
-  VOPDCombineInfo(MachineInstr *First, MachineInstr *Second)
-  : FirstMI(First), SecondMI(Second) {}
-
-  MachineInstr *FirstMI;
-  MachineInstr *SecondMI;
-};
+  class VOPDCombineInfo {
+  public:
+VOPDCombineInfo() = default;
+VOPDCombineInfo(MachineInstr *First, MachineInstr *Second)
+: FirstMI(First), SecondMI(Second) {}
+
+MachineInstr *FirstMI;
+MachineInstr *SecondMI;
+  };
 
 public:
   static char ID;

``




https://github.com/llvm/llvm-project/pull/130548
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankCombiner] Add cast_of_cast and constant_fold_cast combines (PR #131307)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes

We can add a bunch of exts/truncs during RBSelect, we should be able to fold
them away afterwards.

---

Patch is 184.40 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131307.diff


8 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUCombine.td (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+217-397) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+256-424) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll (+120-131) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl-ext-reduce.ll (+5-21) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll (+146-157) 
- (modified) llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll (-1) 
- (modified) llvm/test/CodeGen/AMDGPU/v_sat_pk_u8_i16.ll (+22-32) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td 
b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
index 36653867fbba0..a21505356274b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCombine.td
@@ -180,5 +180,6 @@ def AMDGPURegBankCombiner : GICombiner<
   [unmerge_merge, unmerge_cst, unmerge_undef,
zext_trunc_fold, int_minmax_to_med3, ptr_add_immed_chain,
fp_minmax_to_clamp, fp_minmax_to_med3, fmed3_intrinsic_to_clamp,
-   identity_combines, redundant_and]> {
+   identity_combines, redundant_and, constant_fold_cast_op,
+   cast_of_cast_combines]> {
 }
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index 3a52497bd6e91..07fcb02d98649 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
@@ -41,10 +41,9 @@ define amdgpu_ps i7 @s_fshl_i7(i7 inreg %lhs, i7 inreg %rhs, 
i7 inreg %amt) {
 ; GFX8-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX8-NEXT:s_and_b32 s2, s2, 0x7f
 ; GFX8-NEXT:s_and_b32 s1, s1, 0x7f
-; GFX8-NEXT:s_and_b32 s1, 0x, s1
+; GFX8-NEXT:s_lshr_b32 s1, s1, 1
 ; GFX8-NEXT:v_mul_f32_e32 v0, 0x4f7e, v0
 ; GFX8-NEXT:v_cvt_u32_f32_e32 v0, v0
-; GFX8-NEXT:s_lshr_b32 s1, s1, 1
 ; GFX8-NEXT:v_mul_lo_u32 v1, v0, -7
 ; GFX8-NEXT:v_mul_hi_u32 v1, v0, v1
 ; GFX8-NEXT:v_add_u32_e32 v0, vcc, v0, v1
@@ -72,10 +71,9 @@ define amdgpu_ps i7 @s_fshl_i7(i7 inreg %lhs, i7 inreg %rhs, 
i7 inreg %amt) {
 ; GFX9-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX9-NEXT:s_and_b32 s2, s2, 0x7f
 ; GFX9-NEXT:s_and_b32 s1, s1, 0x7f
-; GFX9-NEXT:s_and_b32 s1, 0x, s1
+; GFX9-NEXT:s_lshr_b32 s1, s1, 1
 ; GFX9-NEXT:v_mul_f32_e32 v0, 0x4f7e, v0
 ; GFX9-NEXT:v_cvt_u32_f32_e32 v0, v0
-; GFX9-NEXT:s_lshr_b32 s1, s1, 1
 ; GFX9-NEXT:v_mul_lo_u32 v1, v0, -7
 ; GFX9-NEXT:v_mul_hi_u32 v1, v0, v1
 ; GFX9-NEXT:v_add_u32_e32 v0, v0, v1
@@ -102,9 +100,8 @@ define amdgpu_ps i7 @s_fshl_i7(i7 inreg %lhs, i7 inreg 
%rhs, i7 inreg %amt) {
 ; GFX10-NEXT:v_cvt_f32_ubyte0_e32 v0, 7
 ; GFX10-NEXT:s_and_b32 s2, s2, 0x7f
 ; GFX10-NEXT:s_and_b32 s1, s1, 0x7f
-; GFX10-NEXT:s_and_b32 s1, 0x, s1
-; GFX10-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX10-NEXT:s_lshr_b32 s1, s1, 1
+; GFX10-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX10-NEXT:v_mul_f32_e32 v0, 0x4f7e, v0
 ; GFX10-NEXT:v_cvt_u32_f32_e32 v0, v0
 ; GFX10-NEXT:v_mul_lo_u32 v1, v0, -7
@@ -134,9 +131,8 @@ define amdgpu_ps i7 @s_fshl_i7(i7 inreg %lhs, i7 inreg 
%rhs, i7 inreg %amt) {
 ; GFX11-NEXT:s_and_b32 s2, s2, 0x7f
 ; GFX11-NEXT:s_and_b32 s1, s1, 0x7f
 ; GFX11-NEXT:s_delay_alu instid0(SALU_CYCLE_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
-; GFX11-NEXT:s_and_b32 s1, 0x, s1
-; GFX11-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX11-NEXT:s_lshr_b32 s1, s1, 1
+; GFX11-NEXT:v_rcp_iflag_f32_e32 v0, v0
 ; GFX11-NEXT:s_waitcnt_depctr 0xfff
 ; GFX11-NEXT:v_mul_f32_e32 v0, 0x4f7e, v0
 ; GFX11-NEXT:s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | 
instid1(VALU_DEP_1)
@@ -351,11 +347,8 @@ define amdgpu_ps i8 @s_fshl_i8(i8 inreg %lhs, i8 inreg 
%rhs, i8 inreg %amt) {
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_and_b32 s1, s1, 0xff
 ; GFX8-NEXT:s_and_b32 s3, s2, 7
-; GFX8-NEXT:s_and_b32 s1, 0x, s1
-; GFX8-NEXT:s_andn2_b32 s2, 7, s2
-; GFX8-NEXT:s_and_b32 s3, 0x, s3
 ; GFX8-NEXT:s_lshr_b32 s1, s1, 1
-; GFX8-NEXT:s_and_b32 s2, 0x, s2
+; GFX8-NEXT:s_andn2_b32 s2, 7, s2
 ; GFX8-NEXT:s_lshl_b32 s0, s0, s3
 ; GFX8-NEXT:s_lshr_b32 s1, s1, s2
 ; GFX8-NEXT:s_or_b32 s0, s0, s1
@@ -365,11 +358,8 @@ define amdgpu_ps i8 @s_fshl_i8(i8 inreg %lhs, i8 inreg 
%rhs, i8 inreg %amt) {
 ; GFX9:   ; %bb.0:
 ; GFX9-NEXT:s_and_b32 s1, s1, 0xff
 ; GFX9-NEXT:s_and_b32 s3, s2, 7
-; GFX9-NEXT:s_and_b32 s1, 0x, s1
-; GFX9-NEXT:s_andn2_b32 s2, 7, s2
-; GFX9-NEXT:s_and_b32 s3, 0x, s3
 ; GFX9-NEXT:s_lshr_b32 s1, s1, 1
-; GFX9-NEXT:s_and_b32 s

[llvm-branch-commits] [llvm] AMDGPU: Switch scheduler-subrange-crash.ll to generated checks (PR #131316)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131316?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131317** https://app.graphite.dev/github/pr/llvm/llvm-project/131317?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131316** https://app.graphite.dev/github/pr/llvm/llvm-project/131316?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131316?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131315** https://app.graphite.dev/github/pr/llvm/llvm-project/131315?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131314** https://app.graphite.dev/github/pr/llvm/llvm-project/131314?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131277** https://app.graphite.dev/github/pr/llvm/llvm-project/131277?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131276** https://app.graphite.dev/github/pr/llvm/llvm-project/131276?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131275** https://app.graphite.dev/github/pr/llvm/llvm-project/131275?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131259** https://app.graphite.dev/github/pr/llvm/llvm-project/131259?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131258** https://app.graphite.dev/github/pr/llvm/llvm-project/131258?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131257** https://app.graphite.dev/github/pr/llvm/llvm-project/131257?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131256** https://app.graphite.dev/github/pr/llvm/llvm-project/131256?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131255** https://app.graphite.dev/github/pr/llvm/llvm-project/131255?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131254** https://app.graphite.dev/github/pr/llvm/llvm-project/131254?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131110** https://app.graphite.dev/github/pr/llvm/llvm-project/131110?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131103** https://app.graphite.dev/github/pr/llvm/llvm-project/131103?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131102** https://app.graphite.dev/github/pr/llvm/llvm-project/131102?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131101** https://app.graphite.dev/github/pr/llvm/llvm-project/131101?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131095** https://app.graphite.dev/github/pr/llvm/llvm-project/131095?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131092** https://app.graphite.dev/github/pr/llvm/llvm-project/131092?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#131106](https://github.com/llvm/llvm-project/pull/131106) https://app.graphite.dev/github/pr/llvm/llvm-project/131106?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.

[llvm-branch-commits] [flang] [flang][OpenMP] Use OmpDirectiveSpecification in standalone directives (PR #131163)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-flang-openmp

@llvm/pr-subscribers-flang-semantics

Author: Krzysztof Parzyszek (kparzysz)


Changes

This uses OmpDirectiveSpecification in the rest of the standalone directives.

---

Patch is 38.80 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131163.diff


16 Files Affected:

- (modified) flang/include/flang/Parser/dump-parse-tree.h (+1) 
- (modified) flang/include/flang/Parser/parse-tree.h (+12-11) 
- (modified) flang/lib/Lower/OpenMP/Clauses.cpp (+19) 
- (modified) flang/lib/Lower/OpenMP/Clauses.h (+3) 
- (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (+13-18) 
- (modified) flang/lib/Parser/openmp-parsers.cpp (+38-36) 
- (modified) flang/lib/Parser/parse-tree.cpp (+8) 
- (modified) flang/lib/Parser/unparse.cpp (+15-15) 
- (modified) flang/lib/Semantics/check-omp-structure.cpp (+92-33) 
- (modified) flang/lib/Semantics/check-omp-structure.h (+2-1) 
- (modified) flang/lib/Semantics/resolve-directives.cpp (+7-2) 
- (modified) flang/lib/Semantics/resolve-names.cpp (+3-3) 
- (modified) flang/test/Parser/OpenMP/depobj-construct.f90 (+16-16) 
- (modified) flang/test/Parser/OpenMP/metadirective-dirspec.f90 (+9-9) 
- (modified) flang/test/Parser/OpenMP/metadirective-flush.f90 (+2-2) 
- (modified) flang/test/Semantics/OpenMP/depobj-construct-v50.f90 (+17) 


``diff
diff --git a/flang/include/flang/Parser/dump-parse-tree.h 
b/flang/include/flang/Parser/dump-parse-tree.h
index 118df6cf2a4ff..9bff2dab974ec 100644
--- a/flang/include/flang/Parser/dump-parse-tree.h
+++ b/flang/include/flang/Parser/dump-parse-tree.h
@@ -484,6 +484,7 @@ class ParseTreeDumper {
   NODE(parser, OmpLocatorList)
   NODE(parser, OmpReductionSpecifier)
   NODE(parser, OmpArgument)
+  NODE(parser, OmpArgumentList)
   NODE(parser, OmpMetadirectiveDirective)
   NODE(parser, OmpMatchClause)
   NODE(parser, OmpOtherwiseClause)
diff --git a/flang/include/flang/Parser/parse-tree.h 
b/flang/include/flang/Parser/parse-tree.h
index dfde4ceb787d2..a31018c9abc09 100644
--- a/flang/include/flang/Parser/parse-tree.h
+++ b/flang/include/flang/Parser/parse-tree.h
@@ -3557,6 +3557,11 @@ struct OmpArgument {
   OmpMapperSpecifier, OmpReductionSpecifier>
   u;
 };
+
+struct OmpArgumentList {
+  WRAPPER_CLASS_BOILERPLATE(OmpArgumentList, std::list);
+  CharBlock source;
+};
 } // namespace arguments
 
 inline namespace traits {
@@ -4511,10 +4516,11 @@ struct OmpDirectiveSpecification {
   llvm::omp::Directive DirId() const { //
 return std::get(t).v;
   }
+  const OmpArgumentList &Arguments() const;
   const OmpClauseList &Clauses() const;
 
   CharBlock source;
-  std::tuple>,
+  std::tuple,
   std::optional, Flags>
   t;
 };
@@ -4865,16 +4871,15 @@ struct OmpLoopDirective {
 
 // 2.14.2 cancellation-point -> CANCELLATION POINT construct-type-clause
 struct OpenMPCancellationPointConstruct {
-  TUPLE_CLASS_BOILERPLATE(OpenMPCancellationPointConstruct);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPCancellationPointConstruct,
+  OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // 2.14.1 cancel -> CANCEL construct-type-clause [ [,] if-clause]
 struct OpenMPCancelConstruct {
-  TUPLE_CLASS_BOILERPLATE(OpenMPCancelConstruct);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPCancelConstruct, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // Ref: [5.0:254-255], [5.1:287-288], [5.2:322-323]
@@ -4884,9 +4889,8 @@ struct OpenMPCancelConstruct {
 //  destroy-clause |
 //  update-clause
 struct OpenMPDepobjConstruct {
-  TUPLE_CLASS_BOILERPLATE(OpenMPDepobjConstruct);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPDepobjConstruct, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple t;
 };
 
 // Ref: [5.2: 200-201]
@@ -4927,11 +4931,8 @@ struct OpenMPDispatchConstruct {
 //ACQ_REL | RELEASE | ACQUIRE | // since 5.0
 //SEQ_CST   // since 5.1
 struct OpenMPFlushConstruct {
-  TUPLE_CLASS_BOILERPLATE(OpenMPFlushConstruct);
+  WRAPPER_CLASS_BOILERPLATE(OpenMPFlushConstruct, OmpDirectiveSpecification);
   CharBlock source;
-  std::tuple,
-  std::optional, /*TrailingClauses=*/bool>
-  t;
 };
 
 struct OpenMPSimpleStandaloneConstruct {
diff --git a/flang/lib/Lower/OpenMP/Clauses.cpp 
b/flang/lib/Lower/OpenMP/Clauses.cpp
index 9fa9abd9e8ceb..7ad6f7f3da00a 100644
--- a/flang/lib/Lower/OpenMP/Clauses.cpp
+++ b/flang/lib/Lower/OpenMP/Clauses.cpp
@@ -132,6 +132,25 @@ Object makeObject(const parser::OmpObject &object,
   return makeObject(std::get(object.u), semaCtx);
 }
 
+ObjectList makeObjects(const parser::OmpArgumentList &objects,
+   semantics::SemanticsContext &semaCtx) {
+  return makeList(objects.v, [&](const parser::OmpArgument &arg) {
+return common::visit(
+common::visitors{
+[&](const parser::OmpLocator &locator) -> Object {
+  if (auto *object = std::get_if(&locator.u)) {
+ 

[llvm-branch-commits] [llvm] [AMDGPU] Update target helpers & GCNSchedStrategy for dynamic VGPRs (PR #130047)

2025-03-15 Thread Diana Picus via llvm-branch-commits


@@ -1452,6 +1452,16 @@ bool GCNSchedStage::shouldRevertScheduling(unsigned 
WavesAfter) {
   if (WavesAfter < DAG.MinOccupancy)
 return true;
 
+  // For dynamic VGPR mode, we don't want to waste any VGPR blocks.
+  if (ST.isDynamicVGPREnabled()) {

rovka wrote:

Probably, but that's a bigger change and I thought I'd like some real world 
performance numbers for that.

https://github.com/llvm/llvm-project/pull/130047
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang][CodeGen] Promote in complex compound divassign (PR #131453)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Mészáros Gergely (Maetveis)


Changes

When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should
promote to a wider type the same way division (without assignment) does.
Prior to this change, Smith's algorithm would be used for divassign.

Fixes: https://github.com/llvm/llvm-project/issues/131129

---

Patch is 78.24 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131453.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CGExprComplex.cpp (+12-6) 
- (modified) clang/test/CodeGen/cx-complex-range.c (+213-321) 


``diff
diff --git a/clang/lib/CodeGen/CGExprComplex.cpp 
b/clang/lib/CodeGen/CGExprComplex.cpp
index a8a65a2f956f8..dc1a34ee82805 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -1212,19 +1212,24 @@ EmitCompoundAssignLValue(const CompoundAssignOperator 
*E,
   OpInfo.FPFeatures = E->getFPFeaturesInEffect(CGF.getLangOpts());
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, OpInfo.FPFeatures);
 
+  const bool IsComplexDivisor = E->getOpcode() == BO_DivAssign &&
+E->getRHS()->getType()->isAnyComplexType();
+
   // Load the RHS and LHS operands.
   // __block variables need to have the rhs evaluated first, plus this should
   // improve codegen a little.
   QualType PromotionTypeCR;
-  PromotionTypeCR = getPromotionType(E->getStoredFPFeaturesOrDefault(),
- E->getComputationResultType());
+  PromotionTypeCR =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getComputationResultType(), IsComplexDivisor);
   if (PromotionTypeCR.isNull())
 PromotionTypeCR = E->getComputationResultType();
   OpInfo.Ty = PromotionTypeCR;
   QualType ComplexElementTy =
   OpInfo.Ty->castAs()->getElementType();
-  QualType PromotionTypeRHS = getPromotionType(
-  E->getStoredFPFeaturesOrDefault(), E->getRHS()->getType());
+  QualType PromotionTypeRHS =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getRHS()->getType(), IsComplexDivisor);
 
   // The RHS should have been converted to the computation type.
   if (E->getRHS()->getType()->isRealFloatingType()) {
@@ -1252,8 +1257,9 @@ EmitCompoundAssignLValue(const CompoundAssignOperator *E,
 
   // Load from the l-value and convert it.
   SourceLocation Loc = E->getExprLoc();
-  QualType PromotionTypeLHS = getPromotionType(
-  E->getStoredFPFeaturesOrDefault(), E->getComputationLHSType());
+  QualType PromotionTypeLHS =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getComputationLHSType(), IsComplexDivisor);
   if (LHSTy->isAnyComplexType()) {
 ComplexPairTy LHSVal = EmitLoadOfLValue(LHS, Loc);
 if (!PromotionTypeLHS.isNull())
diff --git a/clang/test/CodeGen/cx-complex-range.c 
b/clang/test/CodeGen/cx-complex-range.c
index 06a349fbc2a47..a724e1ca8cb6d 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -721,44 +721,32 @@ _Complex float divf(_Complex float a, _Complex float b) {
 // PRMTD-NEXT:[[B_REAL:%.*]] = load float, ptr [[B_REALP]], align 4
 // PRMTD-NEXT:[[B_IMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[B]], i32 0, i32 1
 // PRMTD-NEXT:[[B_IMAG:%.*]] = load float, ptr [[B_IMAGP]], align 4
+// PRMTD-NEXT:[[EXT:%.*]] = fpext float [[B_REAL]] to double
+// PRMTD-NEXT:[[EXT1:%.*]] = fpext float [[B_IMAG]] to double
 // PRMTD-NEXT:[[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
 // PRMTD-NEXT:[[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[TMP0]], i32 0, i32 0
 // PRMTD-NEXT:[[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD-NEXT:[[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[TMP0]], i32 0, i32 1
 // PRMTD-NEXT:[[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD-NEXT:[[TMP1:%.*]] = call float @llvm.fabs.f32(float [[B_REAL]])
-// PRMTD-NEXT:[[TMP2:%.*]] = call float @llvm.fabs.f32(float [[B_IMAG]])
-// PRMTD-NEXT:[[ABS_CMP:%.*]] = fcmp ugt float [[TMP1]], [[TMP2]]
-// PRMTD-NEXT:br i1 [[ABS_CMP]], label 
[[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label 
[[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
-// PRMTD:   abs_rhsr_greater_or_equal_abs_rhsi:
-// PRMTD-NEXT:[[TMP3:%.*]] = fdiv float [[B_IMAG]], [[B_REAL]]
-// PRMTD-NEXT:[[TMP4:%.*]] = fmul float [[TMP3]], [[B_IMAG]]
-// PRMTD-NEXT:[[TMP5:%.*]] = fadd float [[B_REAL]], [[TMP4]]
-// PRMTD-NEXT:[[TMP6:%.*]] = fmul float [[DOTIMAG]], [[TMP3]]
-// PRMTD-NEXT:[[TMP7:%.*]] = fadd float [[DOTREAL]], [[TMP6]]
-// PRMTD-NEXT:[[TMP8:%.*]] = fdiv float [[TMP7]], [[TMP5]]
-// PRMTD-NEXT:[[TMP9:%.*]] = fmul float [[DOTREAL]], [[TMP3]]
-// PRMTD-NEXT:[[TMP10:%.*]] = fsub float [[DOTIMAG]], [[TMP9]]
-// PRMTD-NEXT:[[TMP11:%.*]] = fdiv floa

[llvm-branch-commits] [clang] [Clang][CodeGen] Promote in complex compound divassign (PR #131453)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

https://github.com/Maetveis ready_for_review 
https://github.com/llvm/llvm-project/pull/131453
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes

See #64591

---

Patch is 72.45 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131306.diff


5 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp (+26-2) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll (+5-5) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+243-276) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+124-162) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/orn2.ll (+5-5) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
index c19ee14ab1574..27b86723ce474 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
@@ -2416,9 +2416,10 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
 Register DstReg = MI.getOperand(0).getReg();
 LLT DstTy = MRI.getType(DstReg);
 
-if (DstTy.getSizeInBits() == 1) {
-  const RegisterBank *DstBank =
+const RegisterBank *DstBank =
 OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
+
+if (DstTy.getSizeInBits() == 1) {
   if (DstBank == &AMDGPU::VCCRegBank)
 break;
 
@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.
+if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
+  const LLT S32 = LLT::scalar(32);
+  MachineBasicBlock *MBB = MI.getParent();
+  MachineFunction *MF = MBB->getParent();
+  ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
+  LegalizerHelper Helper(*MF, ApplySALU, B);
+  // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
+  // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
+  // as "not".
+  if (MI.getOpcode() == AMDGPU::G_XOR &&
+  mi_match(MI.getOperand(2).getReg(), MRI, m_SpecificICstOrSplat(-1))) 
{
+Helper.widenScalarSrc(MI, S32, 1, AMDGPU::G_ANYEXT);
+Helper.widenScalarSrc(MI, S32, 2, AMDGPU::G_SEXT);
+Helper.widenScalarDst(MI, S32);
+  } else {
+if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
+  llvm_unreachable("widen scalar should have succeeded");
+  }
+  return;
+}
+
 if (DstTy.getSizeInBits() != 64)
   break;
 
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
index 1a94429b1b5a1..36359579ea442 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/andn2.ll
@@ -391,20 +391,20 @@ define amdgpu_ps i16 @s_andn2_i16_commute(i16 inreg 
%src0, i16 inreg %src1) {
 define amdgpu_ps { i16, i16 } @s_andn2_i16_multi_use(i16 inreg %src0, i16 
inreg %src1) {
 ; GCN-LABEL: s_andn2_i16_multi_use:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s1, s3, -1
+; GCN-NEXT:s_not_b32 s1, s3
 ; GCN-NEXT:s_andn2_b32 s0, s2, s3
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10-LABEL: s_andn2_i16_multi_use:
 ; GFX10:   ; %bb.0:
 ; GFX10-NEXT:s_andn2_b32 s0, s2, s3
-; GFX10-NEXT:s_xor_b32 s1, s3, -1
+; GFX10-NEXT:s_not_b32 s1, s3
 ; GFX10-NEXT:; return to shader part epilog
 ;
 ; GFX11-LABEL: s_andn2_i16_multi_use:
 ; GFX11:   ; %bb.0:
 ; GFX11-NEXT:s_and_not1_b32 s0, s2, s3
-; GFX11-NEXT:s_xor_b32 s1, s3, -1
+; GFX11-NEXT:s_not_b32 s1, s3
 ; GFX11-NEXT:; return to shader part epilog
   %not.src1 = xor i16 %src1, -1
   %and = and i16 %src0, %not.src1
@@ -482,14 +482,14 @@ define amdgpu_ps float @v_andn2_i16_sv(i16 inreg %src0, 
i16 %src1) {
 define amdgpu_ps float @v_andn2_i16_vs(i16 %src0, i16 inreg %src1) {
 ; GCN-LABEL: v_andn2_i16_vs:
 ; GCN:   ; %bb.0:
-; GCN-NEXT:s_xor_b32 s0, s2, -1
+; GCN-NEXT:s_not_b32 s0, s2
 ; GCN-NEXT:v_and_b32_e32 v0, s0, v0
 ; GCN-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GCN-NEXT:; return to shader part epilog
 ;
 ; GFX10PLUS-LABEL: v_andn2_i16_vs:
 ; GFX10PLUS:   ; %bb.0:
-; GFX10PLUS-NEXT:s_xor_b32 s0, s2, -1
+; GFX10PLUS-NEXT:s_not_b32 s0, s2
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, s0, v0
 ; GFX10PLUS-NEXT:v_and_b32_e32 v0, 0x, v0
 ; GFX10PLUS-NEXT:; return to shader part epilog
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
index e60739fd84059..3a52497bd6e91 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll
@@ -1052,17 +1052,14 @@ define amdgpu_ps i32 @s_fshl_v4i8(i32 inreg %lhs.arg, 
i32 inreg %rhs.arg, i32 in
 ; GFX8-NEXT:s_lshr_b32 s2, s2, s3
 ; GFX8-NEXT:s_or_b32 s1, s1, s2
 ; GFX8-NEXT:s_and

[llvm-branch-commits] [clang] [Clang][CodeGen] Do not promote if complex divisor is real (PR #131451)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

https://github.com/Maetveis ready_for_review 
https://github.com/llvm/llvm-project/pull/131451
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes

Instructions like shifts only read some of the bits of the shift amount 
operand, between 4 and 6 bits.
If the source operand is being masked, we can just ignore the mask.

Effects are minimal right now but this will kick in more once we disable 
uniform i16 operation widening in CGP.
With that disabled, we get more i16 shift amounts
that are zext'd and without this we'd end up with
more `s_and_b32 s1, s1, 0x` in the output.

Ideally ISel should handle this but it's proving difficult to get the patterns 
right, and after a few hours of trying I just decided to go with this as it's 
simple enough and it "just works" for this purpose.

---

Patch is 52.03 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131311.diff


8 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+96-1) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll (+2-6) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+92-109) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+97-110) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/lshr.ll (+2-6) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/shl.ll (+1-5) 
- (modified) llvm/test/CodeGen/AMDGPU/constrained-shift.ll (-1) 
- (modified) llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir (+13-13) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp 
b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 91df516b80857..a279a0a973e75 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -131,6 +131,7 @@ class SIFoldOperandsImpl {
   std::optional getImmOrMaterializedImm(MachineOperand &Op) const;
   bool tryConstantFoldOp(MachineInstr *MI) const;
   bool tryFoldCndMask(MachineInstr &MI) const;
+  bool tryFoldBitMask(MachineInstr &MI) const;
   bool tryFoldZeroHighBits(MachineInstr &MI) const;
   bool foldInstOperand(MachineInstr &MI, MachineOperand &OpToFold) const;
 
@@ -1447,6 +1448,99 @@ bool SIFoldOperandsImpl::tryFoldCndMask(MachineInstr 
&MI) const {
   return true;
 }
 
+static bool getBitsReadByInst(unsigned Opc, unsigned &NumBitsRead,
+  unsigned &OpIdx) {
+  switch (Opc) {
+  case AMDGPU::V_ASHR_I32_e64:
+  case AMDGPU::V_ASHR_I32_e32:
+  case AMDGPU::V_LSHR_B32_e64:
+  case AMDGPU::V_LSHR_B32_e32:
+  case AMDGPU::V_LSHL_B32_e64:
+  case AMDGPU::V_LSHL_B32_e32:
+  case AMDGPU::S_LSHL_B32:
+  case AMDGPU::S_LSHR_B32:
+  case AMDGPU::S_ASHR_I32:
+NumBitsRead = 5;
+OpIdx = 2;
+return true;
+  case AMDGPU::S_LSHL_B64:
+  case AMDGPU::S_LSHR_B64:
+  case AMDGPU::S_ASHR_I64:
+NumBitsRead = 6;
+OpIdx = 2;
+return true;
+  case AMDGPU::V_LSHLREV_B32_e64:
+  case AMDGPU::V_LSHLREV_B32_e32:
+  case AMDGPU::V_LSHRREV_B32_e64:
+  case AMDGPU::V_LSHRREV_B32_e32:
+  case AMDGPU::V_ASHRREV_I32_e64:
+  case AMDGPU::V_ASHRREV_I32_e32:
+NumBitsRead = 5;
+OpIdx = 1;
+return true;
+  default:
+return false;
+  }
+}
+
+static bool isAndBitMaskRedundant(MachineInstr &MI, unsigned BitsNeeded,
+unsigned &SrcOp) {
+  MachineOperand *RegOp = &MI.getOperand(1);
+  MachineOperand *ImmOp = &MI.getOperand(2);
+
+  if (!RegOp->isReg() || !ImmOp->isImm()) {
+if (ImmOp->isReg() && RegOp->isImm())
+  std::swap(RegOp, ImmOp);
+else
+  return false;
+  }
+
+  SrcOp = RegOp->getOperandNo();
+
+  const unsigned BitMask = maskTrailingOnes(BitsNeeded);
+  return (ImmOp->getImm() & BitMask) == BitMask;
+}
+
+bool SIFoldOperandsImpl::tryFoldBitMask(MachineInstr &MI) const {
+  unsigned NumBitsRead = 0;
+  unsigned OpIdx = 0;
+  if (!getBitsReadByInst(MI.getOpcode(), NumBitsRead, OpIdx))
+return false;
+
+  MachineOperand &Op = MI.getOperand(OpIdx);
+  if (!Op.isReg())
+return false;
+
+  Register OpReg = Op.getReg();
+  if (OpReg.isPhysical())
+return false;
+
+  MachineInstr *OpDef = MRI->getVRegDef(OpReg);
+  if (!OpDef)
+return false ;
+
+  LLVM_DEBUG(dbgs() << "tryFoldBitMask: " << MI << "\tOpIdx:" << OpIdx << ", 
NumBitsRead:" << NumBitsRead << "\n");
+
+  unsigned ReplaceWith;
+  switch (OpDef->getOpcode()) {
+  // TODO: add more opcodes?
+  case AMDGPU::S_AND_B32:
+  case AMDGPU::V_AND_B32_e32:
+  case AMDGPU::V_AND_B32_e64:
+if (!isAndBitMaskRedundant(*OpDef, NumBitsRead, ReplaceWith))
+  return false;
+break;
+  default:
+return false;
+  }
+
+  MachineOperand &ReplaceWithOp = OpDef->getOperand(ReplaceWith);
+  LLVM_DEBUG(dbgs() << "\treplacing operand with:" << ReplaceWithOp << "\n");
+
+  MI.getOperand(OpIdx).setReg(ReplaceWithOp.getReg());
+  return true;
+}
+
 bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineInstr &MI) const {
   if (MI.getOpcode() != AMDGPU::V_AND_B32_e64 &&
   MI.getOpcode() != AMDGPU::V_AND_B32_e32)
@@ -1458,7 +1552,7 @@ bool SIFoldOperandsImpl::tryFoldZeroHighBits(MachineIns

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SIPreEmitPeephole to NPM (PR #130065)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan ready_for_review 
https://github.com/llvm/llvm-project/pull/130065
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen][NPM] Port StackFrameLayoutAnalysisPass to NPM (PR #130070)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/130070
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Pierre van Houtryve (Pierre-vh)


Changes

This is a bit of an akward pattern that can come up as a result
of legalization and then widening of i16 operations to i32 in RegBankSelect
on AMDGPU.

This quick combine avoids redundant patterns like
```
s_sext_i32_i8 s0, s0
s_sext_i32_i16 s0, s0
s_ashr_i32 s0, s0, s1
```

With this the second sext is removed as it's redundant.

---
Full diff: https://github.com/llvm/llvm-project/pull/131312.diff


3 Files Affected:

- (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+11-1) 
- (added) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
(+86) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.abs.ll (+16-62) 


``diff
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td 
b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 3590ab221ad44..9727b86b4be8b 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -258,6 +258,14 @@ def sext_trunc_sextload : GICombineRule<
  [{ return Helper.matchSextTruncSextLoad(*${d}); }]),
   (apply [{ Helper.applySextTruncSextLoad(*${d}); }])>;
 
+def sext_trunc_sextinreg : GICombineRule<
+  (defs root:$dst),
+  (match (G_SEXT_INREG $sir, $src, $width),
+ (G_TRUNC $trunc, $sir),
+ (G_SEXT $dst, $trunc),
+ [{ return (MRI.getType(${trunc}.getReg()).getScalarSizeInBits() >= 
${width}.getImm()); }]),
+  (apply (GIReplaceReg $dst, $sir))>;
+
 def sext_inreg_of_load_matchdata : GIDefMatchData<"std::tuple">;
 def sext_inreg_of_load : GICombineRule<
   (defs root:$root, sext_inreg_of_load_matchdata:$matchinfo),
@@ -1896,7 +1904,9 @@ def cast_of_cast_combines: GICombineGroup<[
   sext_of_anyext,
   anyext_of_anyext,
   anyext_of_zext,
-  anyext_of_sext
+  anyext_of_sext,
+
+  sext_trunc_sextinreg
 ]>;
 
 def cast_combines: GICombineGroup<[
diff --git 
a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
new file mode 100644
index 0..d41e5b172efc2
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-sext-trunc-sextinreg.mir
@@ -0,0 +1,86 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-postlegalizer-combiner -verify-machineinstrs %s -o - | 
FileCheck %s
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1030 
-run-pass=amdgpu-regbank-combiner -verify-machineinstrs %s -o - | FileCheck %s
+
+---
+name: trunc_s16_inreg_8
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_8
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 8
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s16_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s16_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: $vgpr0 = COPY %inreg(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s16) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+---
+name: trunc_s8_inreg_16
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: trunc_s8_inreg_16
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 16
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s32) = G_SEXT %trunc(s8)
+; CHECK-NEXT: $vgpr0 = COPY %sext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy, 16
+%trunc:_(s8) = G_TRUNC %inreg
+%sext:_(s32) = G_SEXT %trunc
+$vgpr0 = COPY %sext
+...
+
+# TODO?: We could handle this by inserting a trunc, but I'm not sure how 
useful that'd be.
+---
+name: mismatching_types
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0
+; CHECK-LABEL: name: mismatching_types
+; CHECK: liveins: $vgpr0
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: %copy:_(s32) = COPY $vgpr0
+; CHECK-NEXT: %inreg:_(s32) = G_SEXT_INREG %copy, 8
+; CHECK-NEXT: %trunc:_(s8) = G_TRUNC %inreg(s32)
+; CHECK-NEXT: %sext:_(s16) = G_SEXT %trunc(s8)
+; CHECK-NEXT: %anyext:_(s32) = G_ANYEXT %sext(s16)
+; CHECK-NEXT: $vgpr0 = COPY %anyext(s32)
+%copy:_(s32) = COPY $vgpr0
+%inreg:_(s32) = G_SEXT_INREG %copy,

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SIInsertHardClauses to NPM (PR #130062)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/130062?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#130070** https://app.graphite.dev/github/pr/llvm/llvm-project/130070?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130069** https://app.graphite.dev/github/pr/llvm/llvm-project/130069?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130068** https://app.graphite.dev/github/pr/llvm/llvm-project/130068?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130067** https://app.graphite.dev/github/pr/llvm/llvm-project/130067?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130066** https://app.graphite.dev/github/pr/llvm/llvm-project/130066?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130065** https://app.graphite.dev/github/pr/llvm/llvm-project/130065?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130064** https://app.graphite.dev/github/pr/llvm/llvm-project/130064?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130063** https://app.graphite.dev/github/pr/llvm/llvm-project/130063?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130062** https://app.graphite.dev/github/pr/llvm/llvm-project/130062?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/130062?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#130061** https://app.graphite.dev/github/pr/llvm/llvm-project/130061?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130060** https://app.graphite.dev/github/pr/llvm/llvm-project/130060?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130059** https://app.graphite.dev/github/pr/llvm/llvm-project/130059?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129866** https://app.graphite.dev/github/pr/llvm/llvm-project/129866?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129865** https://app.graphite.dev/github/pr/llvm/llvm-project/129865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129857** https://app.graphite.dev/github/pr/llvm/llvm-project/129857?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129853** https://app.graphite.dev/github/pr/llvm/llvm-project/129853?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129828** https://app.graphite.dev/github/pr/llvm/llvm-project/129828?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/130062
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] release/20.x: [libcxx] Add a missing include for __bit_iterator (#127015) (PR #131382)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: None (llvmbot)


Changes

Backport 672e385

Requested by: @ian-twilightcoder

---
Full diff: https://github.com/llvm/llvm-project/pull/131382.diff


1 Files Affected:

- (modified) libcxx/include/__vector/vector_bool.h (+1) 


``diff
diff --git a/libcxx/include/__vector/vector_bool.h 
b/libcxx/include/__vector/vector_bool.h
index 4f1c442ce0be8..feff646a35dc8 100644
--- a/libcxx/include/__vector/vector_bool.h
+++ b/libcxx/include/__vector/vector_bool.h
@@ -17,6 +17,7 @@
 #include <__bit_reference>
 #include <__config>
 #include <__functional/unary_function.h>
+#include <__fwd/bit_reference.h>
 #include <__fwd/functional.h>
 #include <__fwd/vector.h>
 #include <__iterator/distance.h>

``




https://github.com/llvm/llvm-project/pull/131382
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port SIMemoryLegalizer to NPM (PR #130060)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

optimisan wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/130060?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#130070** https://app.graphite.dev/github/pr/llvm/llvm-project/130070?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130069** https://app.graphite.dev/github/pr/llvm/llvm-project/130069?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130068** https://app.graphite.dev/github/pr/llvm/llvm-project/130068?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130067** https://app.graphite.dev/github/pr/llvm/llvm-project/130067?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130066** https://app.graphite.dev/github/pr/llvm/llvm-project/130066?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130065** https://app.graphite.dev/github/pr/llvm/llvm-project/130065?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130064** https://app.graphite.dev/github/pr/llvm/llvm-project/130064?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130063** https://app.graphite.dev/github/pr/llvm/llvm-project/130063?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130062** https://app.graphite.dev/github/pr/llvm/llvm-project/130062?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130061** https://app.graphite.dev/github/pr/llvm/llvm-project/130061?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#130060** https://app.graphite.dev/github/pr/llvm/llvm-project/130060?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/130060?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#130059** https://app.graphite.dev/github/pr/llvm/llvm-project/130059?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129866** https://app.graphite.dev/github/pr/llvm/llvm-project/129866?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129865** https://app.graphite.dev/github/pr/llvm/llvm-project/129865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129857** https://app.graphite.dev/github/pr/llvm/llvm-project/129857?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129853** https://app.graphite.dev/github/pr/llvm/llvm-project/129853?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#129828** https://app.graphite.dev/github/pr/llvm/llvm-project/129828?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/130060
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][Cloning] Replace DIFinder usage in CloneFunctionInto with a MetadataPredicate (PR #129148)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129148

>From 66695d8ecc670230dd1621297851d0d0e673ff1a Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 12:07:03 -0800
Subject: [PATCH] [NFC][Cloning] Replace DIFinder usage in CloneFunctionInto
 with a MetadataPredicate

Summary:
The new code should be functionally identical to the old one (but
faster). The reasoning is as follows.

In the old code when cloning within the module:
1. DIFinder traverses and collects *all* debug info reachable from a
   function, its instructions, and its owning compile unit.
2. Then "compile units, types, other subprograms, and lexical blocks of
   other subprograms" are saved in a set.
3. Then when we MapMetadata, we traverse the function's debug info
   _again_ and those nodes that are in the set from p.2 are identity
   mapped.

This looks equivalent to just doing step 3 with identity mapping based
on a predicate that says to identity map "compile units, types, other
subprograms, and lexical blocks of other subprograms" (same as in step
2). This is what the new code does.

Test Plan:
ninja check-all
There's a bunch of tests around cloning and all of them pass.

stack-info: PR: https://github.com/llvm/llvm-project/pull/129148, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/6
---
 llvm/lib/Transforms/Utils/CloneFunction.cpp | 32 -
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 502c4898c5940..8080dca09be00 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -50,6 +50,30 @@ void collectDebugInfoFromInstructions(const Function &F,
   DIFinder.processInstruction(*M, I);
   }
 }
+
+// Create a predicate that matches the metadata that should be identity mapped
+// during function cloning.
+MetadataPredicate createIdentityMDPredicate(const Function &F,
+CloneFunctionChangeType Changes) {
+  if (Changes >= CloneFunctionChangeType::DifferentModule)
+return [](const Metadata *MD) { return false; };
+
+  DISubprogram *SPClonedWithinModule = F.getSubprogram();
+  return [=](const Metadata *MD) {
+// Avoid cloning types, compile units, and (other) subprograms.
+if (isa(MD) || isa(MD))
+  return true;
+
+if (auto *SP = dyn_cast(MD); SP)
+  return SP != SPClonedWithinModule;
+
+// If a subprogram isn't going to be cloned skip its lexical blocks as 
well.
+if (auto *LScope = dyn_cast(MD); LScope)
+  return LScope->getSubprogram() != SPClonedWithinModule;
+
+return false;
+  };
+}
 } // namespace
 
 /// See comments in Cloning.h.
@@ -325,13 +349,7 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
 }
   }
 
-  DISubprogram *SPClonedWithinModule =
-  CollectDebugInfoForCloning(*OldFunc, Changes, DIFinder);
-
-  MetadataPredicate IdentityMD =
-  [MDSet =
-   FindDebugInfoToIdentityMap(Changes, DIFinder, 
SPClonedWithinModule)](
-  const Metadata *MD) { return MDSet.contains(MD); };
+  MetadataPredicate IdentityMD = createIdentityMDPredicate(*OldFunc, Changes);
 
   // Cloning is always a Module level operation, since Metadata needs to be
   // cloned.

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][RegBankInfo] Promote scalar i16 and/or/xor to i32 (PR #131306)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits


@@ -2432,6 +2433,29 @@ void AMDGPURegisterBankInfo::applyMappingImpl(
   return;
 }
 
+// 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
+// Packed 16-bit operations need to be scalarized and promoted.
+if (DstTy.getSizeInBits() == 16 && DstBank == &AMDGPU::SGPRRegBank) {
+  const LLT S32 = LLT::scalar(32);
+  MachineBasicBlock *MBB = MI.getParent();
+  MachineFunction *MF = MBB->getParent();
+  ApplyRegBankMapping ApplySALU(B, *this, MRI, &AMDGPU::SGPRRegBank);
+  LegalizerHelper Helper(*MF, ApplySALU, B);
+  // Widen to S32, but handle `G_XOR x, -1` differently. Legalizer widening
+  // will use a G_ANYEXT to extend the -1 which prevents matching G_XOR -1
+  // as "not".

arsenm wrote:

I'd still expect to form not patterns canonically. I don't know why the iSA 
provides it, but other patterns to make use of not

https://github.com/llvm/llvm-project/pull/131306
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/20.x: [HEXAGON] Fix hvx-isel for extract_subvector op (#129672) (PR #130215)

2025-03-15 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/130215
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Migrate more tests away from undef (PR #131314)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Matt Arsenault (arsenm)


Changes

andorbitset.ll is interesting since it directly depends on the
difference between poison and undef. Not sure it's useful to keep
the version using poison, I assume none of this code makes it to
codegen.

si-spill-cf.ll was also a nasty case, which I doubt has been reproducing
its original issue for a very long time. I had to reclaim an older version,
replace some of the poison uses, and run simplify-cfg. There's a very
slight change in the final CFG with this, but final the output is approximately
the same as it used to be.

---

Patch is 119.89 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/131314.diff


26 Files Affected:

- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+174-169) 
- (modified) 
llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-fold-binop-select.ll (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/andorbitset.ll (+31-4) 
- (modified) llvm/test/CodeGen/AMDGPU/cndmask-no-def-vcc.ll (+4-2) 
- (modified) llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll (+142-84) 
- (modified) llvm/test/CodeGen/AMDGPU/fold-fabs.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/i1-copy-implicit-def.ll (+19-3) 
- (modified) llvm/test/CodeGen/AMDGPU/memory-legalizer-invalid-addrspace.mir 
(+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/merge-load-store-vreg.mir (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll (+2-2) 
- (modified) llvm/test/CodeGen/AMDGPU/nested-loop-conditions.ll (+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll (+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/si-annotate-cf-noloop.ll (+3-3) 
- (modified) llvm/test/CodeGen/AMDGPU/si-spill-cf.ll (+128-190) 
- (modified) llvm/test/CodeGen/AMDGPU/skip-if-dead.ll (+15-6) 
- (modified) llvm/test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll 
(+4-4) 
- (modified) llvm/test/CodeGen/AMDGPU/splitkit-getsubrangeformask.ll (+22-20) 
- (modified) llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll (+7-4) 
- (modified) llvm/test/CodeGen/AMDGPU/uniform-cfg.ll (+11-1) 
- (modified) llvm/test/CodeGen/AMDGPU/vgpr-liverange-ir.ll (+2-1) 
- (modified) llvm/test/CodeGen/AMDGPU/wave32.ll (+2-2) 
- (modified) llvm/test/CodeGen/MIR/AMDGPU/custom-pseudo-source-values.ll (+1-1) 
- (modified) 
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg-debug.ll 
(+1-1) 
- (modified) 
llvm/test/CodeGen/MIR/AMDGPU/machine-function-info-long-branch-reg.ll (+1-1) 
- (modified) llvm/test/CodeGen/MIR/AMDGPU/mircanon-memoperands.mir (+6-6) 
- (modified) llvm/test/CodeGen/MIR/AMDGPU/syncscopes.mir (+6-6) 


``diff
diff --git a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll 
b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
index a4eab62f501ce..3160e38df5e3f 100644
--- a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
+++ b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
@@ -513,115 +513,117 @@ define amdgpu_kernel void @introduced_copy_to_sgpr(i64 
%arg, i32 %arg1, i32 %arg
 ; GFX908-LABEL: introduced_copy_to_sgpr:
 ; GFX908:   ; %bb.0: ; %bb
 ; GFX908-NEXT:global_load_ushort v16, v[0:1], off glc
-; GFX908-NEXT:s_load_dwordx4 s[0:3], s[8:9], 0x0
-; GFX908-NEXT:s_load_dwordx2 s[4:5], s[8:9], 0x10
-; GFX908-NEXT:s_load_dword s7, s[8:9], 0x18
-; GFX908-NEXT:s_mov_b32 s6, 0
-; GFX908-NEXT:s_mov_b32 s9, s6
+; GFX908-NEXT:s_load_dwordx4 s[4:7], s[8:9], 0x0
+; GFX908-NEXT:s_load_dwordx2 s[10:11], s[8:9], 0x10
+; GFX908-NEXT:s_load_dword s0, s[8:9], 0x18
+; GFX908-NEXT:s_mov_b32 s12, 0
+; GFX908-NEXT:s_mov_b32 s9, s12
 ; GFX908-NEXT:s_waitcnt lgkmcnt(0)
-; GFX908-NEXT:v_cvt_f32_u32_e32 v0, s3
-; GFX908-NEXT:s_sub_i32 s8, 0, s3
-; GFX908-NEXT:v_cvt_f32_f16_e32 v17, s7
+; GFX908-NEXT:v_cvt_f32_u32_e32 v0, s7
+; GFX908-NEXT:s_sub_i32 s1, 0, s7
+; GFX908-NEXT:v_cvt_f32_f16_e32 v17, s0
 ; GFX908-NEXT:v_mov_b32_e32 v19, 0
 ; GFX908-NEXT:v_rcp_iflag_f32_e32 v2, v0
 ; GFX908-NEXT:v_mov_b32_e32 v0, 0
 ; GFX908-NEXT:v_mov_b32_e32 v1, 0
 ; GFX908-NEXT:v_mul_f32_e32 v2, 0x4f7e, v2
 ; GFX908-NEXT:v_cvt_u32_f32_e32 v2, v2
-; GFX908-NEXT:v_readfirstlane_b32 s10, v2
-; GFX908-NEXT:s_mul_i32 s8, s8, s10
-; GFX908-NEXT:s_mul_hi_u32 s8, s10, s8
-; GFX908-NEXT:s_add_i32 s10, s10, s8
-; GFX908-NEXT:s_mul_hi_u32 s8, s2, s10
-; GFX908-NEXT:s_mul_i32 s10, s8, s3
-; GFX908-NEXT:s_sub_i32 s2, s2, s10
-; GFX908-NEXT:s_add_i32 s11, s8, 1
-; GFX908-NEXT:s_sub_i32 s10, s2, s3
-; GFX908-NEXT:s_cmp_ge_u32 s2, s3
-; GFX908-NEXT:s_cselect_b32 s8, s11, s8
-; GFX908-NEXT:s_cselect_b32 s2, s10, s2
-; GFX908-NEXT:s_add_i32 s10, s8, 1
-; GFX908-NEXT:s_cmp_ge_u32 s2, s3
-; GFX908-NEXT:s_cselect_b32 s8, s10, s8
-; GFX908-NEXT:s_lshr_b32 s7, s7, 16
-; GFX908-NEXT:v_cvt_f32_f16_e32 v18, s7
-; GFX908-

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Combine (sext (trunc (sext_in_reg x))) (PR #131312)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131312
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Precommit si-fold-bitmask.mir (PR #131310)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131310

>From fcd5623ccd18100197817f7f4d5a500ca433f8dc Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Fri, 14 Mar 2025 10:00:21 +0100
Subject: [PATCH] [AMDGPU] Precommit si-fold-bitmask.mir

---
 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir | 429 ++
 1 file changed, 429 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir 
b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
new file mode 100644
index 0..1edf970591179
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-fold-bitmasks.mir
@@ -0,0 +1,429 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -run-pass=si-fold-operands 
-verify-machineinstrs -o - %s | FileCheck --check-prefix=GCN %s
+
+# Test supported instructions
+
+---
+name: v_ashr_i32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_ashr_i32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_ASHR_I32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e32 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshr_b32_e32__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshr_b32_e32__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHR_B32_e32 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: v_lshl_b32_e64__v_and_b32_e32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $vgpr0, $vgpr1
+
+; GCN-LABEL: name: v_lshl_b32_e64__v_and_b32_e32
+; GCN: liveins: $vgpr0, $vgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:vgpr_32 = COPY $vgpr0
+; GCN-NEXT: %shift:vgpr_32 = COPY $vgpr1
+; GCN-NEXT: %shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit 
$exec
+; GCN-NEXT: %ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+; GCN-NEXT: $vgpr0 = COPY %ret
+%src:vgpr_32 = COPY $vgpr0
+%shift:vgpr_32 = COPY $vgpr1
+%shiftmask:vgpr_32 = V_AND_B32_e64 65535, %shift, implicit $exec
+%ret:vgpr_32 = V_LSHL_B32_e64 %src, %shiftmask, implicit $exec
+$vgpr0 = COPY %ret
+...
+
+---
+name: s_lshl_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshl_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_32 = COPY $sgpr1
+; GCN-NEXT: %shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+; GCN-NEXT: %ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+; GCN-NEXT: $sgpr0 = COPY %ret
+%src:sgpr_32 = COPY $sgpr0
+%shift:sgpr_32 = COPY $sgpr1
+%shiftmask:sgpr_32 = S_AND_B32 65535, %shift, implicit-def $scc
+%ret:sgpr_32 = S_LSHL_B32 %src, %shiftmask, implicit-def $scc
+$sgpr0 = COPY %ret
+...
+
+---
+name: s_lshr_b32__s_and_b32
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr0, $sgpr1
+
+; GCN-LABEL: name: s_lshr_b32__s_and_b32
+; GCN: liveins: $sgpr0, $sgpr1
+; GCN-NEXT: {{  $}}
+; GCN-NEXT: %src:sgpr_32 = COPY $sgpr0
+; GCN-NEXT: %shift:sgpr_

[llvm-branch-commits] [llvm] AMDGPU: Switch simplifydemandedbits-recursion.ll to generated checks (PR #131317)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/131317

>From 0bccd4581d72280722f34f28e87682dd27e59c5c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 14 Mar 2025 18:06:43 +0700
Subject: [PATCH] AMDGPU: Switch simplifydemandedbits-recursion.ll to generated
 checks

This just checked the s_endpgm. Generate full checks, and remove undefs.
---
 .../AMDGPU/simplifydemandedbits-recursion.ll  | 79 +--
 1 file changed, 71 insertions(+), 8 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/simplifydemandedbits-recursion.ll 
b/llvm/test/CodeGen/AMDGPU/simplifydemandedbits-recursion.ll
index 55b4d12805926..a5299ea36958d 100644
--- a/llvm/test/CodeGen/AMDGPU/simplifydemandedbits-recursion.ll
+++ b/llvm/test/CodeGen/AMDGPU/simplifydemandedbits-recursion.ll
@@ -1,3 +1,4 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
 ; RUN: llc -mtriple=amdgcn < %s | FileCheck %s
 
 ; Check we can compile this bugpoint-reduced test without an
@@ -9,17 +10,79 @@
 
 @0 = external unnamed_addr addrspace(3) global [462 x float], align 4
 
-; Function Attrs: nounwind readnone speculatable
 declare i32 @llvm.amdgcn.workitem.id.y() #0
-
-; Function Attrs: nounwind readnone speculatable
 declare i32 @llvm.amdgcn.workitem.id.x() #0
-
-; Function Attrs: nounwind readnone speculatable
 declare float @llvm.fmuladd.f32(float, float, float) #0
 
-; CHECK: s_endpgm
 define amdgpu_kernel void @foo(ptr addrspace(1) noalias nocapture readonly 
%arg, ptr addrspace(1) noalias nocapture readonly %arg1, ptr addrspace(1) 
noalias nocapture %arg2, float %arg3, i1 %c0, i1 %c1, i1 %c2, i1 %c3, i1 %c4, 
i1 %c5) local_unnamed_addr !reqd_work_group_size !0 {
+; CHECK-LABEL: foo:
+; CHECK:   ; %bb.0: ; %bb
+; CHECK-NEXT:s_load_dword s6, s[4:5], 0x10
+; CHECK-NEXT:s_load_dwordx2 s[2:3], s[4:5], 0x10
+; CHECK-NEXT:s_load_dword s10, s[4:5], 0x11
+; CHECK-NEXT:v_lshlrev_b32_e32 v2, 2, v0
+; CHECK-NEXT:s_movk_i32 s0, 0x54
+; CHECK-NEXT:v_mov_b32_e32 v0, 0
+; CHECK-NEXT:v_mad_u32_u24 v1, v1, s0, v2
+; CHECK-NEXT:s_waitcnt lgkmcnt(0)
+; CHECK-NEXT:s_bitcmp1_b32 s6, 8
+; CHECK-NEXT:s_cselect_b64 s[0:1], -1, 0
+; CHECK-NEXT:s_bitcmp1_b32 s6, 16
+; CHECK-NEXT:v_cndmask_b32_e64 v2, 0, 1, s[0:1]
+; CHECK-NEXT:s_cselect_b64 s[4:5], -1, 0
+; CHECK-NEXT:v_cmp_ne_u32_e64 s[0:1], 1, v2
+; CHECK-NEXT:s_xor_b64 s[4:5], s[4:5], -1
+; CHECK-NEXT:s_bitcmp1_b32 s2, 24
+; CHECK-NEXT:s_cselect_b64 s[6:7], -1, 0
+; CHECK-NEXT:s_xor_b64 s[6:7], s[6:7], -1
+; CHECK-NEXT:s_bitcmp1_b32 s3, 0
+; CHECK-NEXT:s_cselect_b64 s[8:9], -1, 0
+; CHECK-NEXT:s_bitcmp1_b32 s10, 8
+; CHECK-NEXT:s_cselect_b64 s[10:11], -1, 0
+; CHECK-NEXT:s_and_b64 s[2:3], exec, s[6:7]
+; CHECK-NEXT:s_and_b64 s[4:5], exec, s[4:5]
+; CHECK-NEXT:s_and_b64 s[6:7], exec, s[10:11]
+; CHECK-NEXT:s_and_b64 s[8:9], exec, s[8:9]
+; CHECK-NEXT:s_mov_b32 m0, -1
+; CHECK-NEXT:  .LBB0_1: ; %.loopexit145
+; CHECK-NEXT:; =>This Loop Header: Depth=1
+; CHECK-NEXT:; Child Loop BB0_3 Depth 2
+; CHECK-NEXT:; Child Loop BB0_4 Depth 3
+; CHECK-NEXT:; Child Loop BB0_5 Depth 2
+; CHECK-NEXT:v_mov_b32_e32 v2, v1
+; CHECK-NEXT:s_branch .LBB0_3
+; CHECK-NEXT:  .LBB0_2: ; %.loopexit
+; CHECK-NEXT:; in Loop: Header=BB0_3 Depth=2
+; CHECK-NEXT:v_add_i32_e32 v2, vcc, 0x540, v2
+; CHECK-NEXT:s_mov_b64 vcc, s[4:5]
+; CHECK-NEXT:s_cbranch_vccnz .LBB0_5
+; CHECK-NEXT:  .LBB0_3: ; %bb13
+; CHECK-NEXT:; Parent Loop BB0_1 Depth=1
+; CHECK-NEXT:; => This Loop Header: Depth=2
+; CHECK-NEXT:; Child Loop BB0_4 Depth 3
+; CHECK-NEXT:s_and_b64 vcc, exec, s[0:1]
+; CHECK-NEXT:v_mov_b32_e32 v3, v2
+; CHECK-NEXT:s_cbranch_vccnz .LBB0_2
+; CHECK-NEXT:  .LBB0_4: ; %bb21
+; CHECK-NEXT:; Parent Loop BB0_1 Depth=1
+; CHECK-NEXT:; Parent Loop BB0_3 Depth=2
+; CHECK-NEXT:; => This Inner Loop Header: Depth=3
+; CHECK-NEXT:ds_write_b32 v3, v0
+; CHECK-NEXT:v_add_i32_e32 v3, vcc, 32, v3
+; CHECK-NEXT:s_mov_b64 vcc, s[2:3]
+; CHECK-NEXT:s_cbranch_vccz .LBB0_4
+; CHECK-NEXT:s_branch .LBB0_2
+; CHECK-NEXT:  .LBB0_5: ; %bb31
+; CHECK-NEXT:; Parent Loop BB0_1 Depth=1
+; CHECK-NEXT:; => This Inner Loop Header: Depth=2
+; CHECK-NEXT:s_mov_b64 vcc, s[6:7]
+; CHECK-NEXT:s_cbranch_vccz .LBB0_5
+; CHECK-NEXT:  ; %bb.6: ; %bb30
+; CHECK-NEXT:; in Loop: Header=BB0_1 Depth=1
+; CHECK-NEXT:s_mov_b64 vcc, s[8:9]
+; CHECK-NEXT:s_cbranch_vccz .LBB0_1
+; CHECK-NEXT:  ; %bb.7: ; %bb11
+; CHECK-NEXT:s_endpgm
 bb:
   %tmp = tail call i32 @llvm.amdgcn.workitem.id.y()
   %tmp4 = tail call i32 @llvm.amdgcn.workitem.id.x()
@@ -47,7 +110,7 @@ bb13: ; preds = 
%.loopexit, %.loopex
 
 bb17: ; preds = %bb13
   %tmp18 = mul i32 %tmp15, 224
-  %tmp19 = add i32 undef, %tmp18
+  

[llvm-branch-commits] [llvm] [AMDGPU][GlobalISel] Allow forming s16 U/SBFX pre-regbankselect (PR #131309)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131309
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Switch scheduler-subrange-crash.ll to generated checks (PR #131316)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/131316

>From 9ce83f0ac4d3a856341617f698f5fd3261ca4554 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 14 Mar 2025 18:03:12 +0700
Subject: [PATCH] AMDGPU: Switch scheduler-subrange-crash.ll to generated
 checks

Also remove unnecessarily requiring asserts, and replace undef
with poison.
---
 .../AMDGPU/scheduler-subrange-crash.ll| 30 ---
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/scheduler-subrange-crash.ll 
b/llvm/test/CodeGen/AMDGPU/scheduler-subrange-crash.ll
index a81c18ebb259e..3dba3e87c64c1 100644
--- a/llvm/test/CodeGen/AMDGPU/scheduler-subrange-crash.ll
+++ b/llvm/test/CodeGen/AMDGPU/scheduler-subrange-crash.ll
@@ -1,5 +1,5 @@
-; RUN: llc -mtriple=amdgcn < %s | FileCheck %s
-; REQUIRES: asserts
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=amdgcn -mcpu=tonga < %s | FileCheck %s
 ;
 ; This test used to crash with the following assertion:
 ; llc: include/llvm/ADT/IntervalMap.h:632: unsigned int 
llvm::IntervalMapImpl::LeafNode >::insertFrom(unsigned int &, unsigned 
int, KeyT, KeyT, ValT) [KeyT = llvm::SlotIndex, ValT = llvm::LiveInterval *, N 
= 8, Traits = llvm::IntervalMapInfo]: Assertion `(i == Size || 
Traits::stopLess(b, start(i))) && "Overlapping insert"' failed.
@@ -8,12 +8,22 @@
 ; (i.e. live interval subranges): subregister defs are not uses for that
 ; purpose.
 ;
-; Check for a valid output:
-; CHECK: tbuffer_store_format_x
-
-target triple = "amdgcn--"
 
 define amdgpu_gs void @main(i32 inreg %arg) #0 {
+; CHECK-LABEL: main:
+; CHECK:   ; %bb.0: ; %main_body
+; CHECK-NEXT:s_movk_i32 s1, 0x1300
+; CHECK-NEXT:buffer_load_dword v0, v0, s[0:3], s1 offen
+; CHECK-NEXT:s_waitcnt vmcnt(0)
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:36 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:48 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:72 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:28 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:64 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:20 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:56 glc slc
+; CHECK-NEXT:tbuffer_store_format_x v0, off, s[0:3], s0 
format:[BUF_DATA_FORMAT_32,BUF_NUM_FORMAT_UINT] offset:92 glc slc
+; CHECK-NEXT:s_endpgm
 main_body:
   %tmp = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> poison, i32 20, 
i32 0)
   %tmp1 = call float @llvm.amdgcn.s.buffer.load.f32(<4 x i32> poison, i32 24, 
i32 0)
@@ -27,17 +37,17 @@ main_body:
   %tmp3 = call i32 @llvm.amdgcn.raw.buffer.load.i32(<4 x i32> poison, i32 
poison, i32 4864, i32 0)
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 %tmp3, <4 x i32> poison, 
i32 36, i32 %arg, i32 68, i32 3)
   %bc = bitcast <4 x float> %array_vector3 to <4 x i32>
-  %tmp4 = extractelement <4 x i32> %bc, i32 undef
+  %tmp4 = extractelement <4 x i32> %bc, i32 poison
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 %tmp4, <4 x i32> poison, 
i32 48, i32 %arg, i32 68, i32 3)
   %bc49 = bitcast <4 x float> %array_vector11 to <4 x i32>
-  %tmp5 = extractelement <4 x i32> %bc49, i32 undef
+  %tmp5 = extractelement <4 x i32> %bc49, i32 poison
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 %tmp5, <4 x i32> poison, 
i32 72, i32 %arg, i32 68, i32 3)
   %array_vector21 = insertelement <4 x float> , float %tmp, i32 1
   %array_vector22 = insertelement <4 x float> %array_vector21, float poison, 
i32 2
   %array_vector23 = insertelement <4 x float> %array_vector22, float poison, 
i32 3
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 poison, <4 x i32> poison, 
i32 28, i32 %arg, i32 68, i32 3)
   %bc52 = bitcast <4 x float> %array_vector23 to <4 x i32>
-  %tmp6 = extractelement <4 x i32> %bc52, i32 undef
+  %tmp6 = extractelement <4 x i32> %bc52, i32 poison
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 %tmp6, <4 x i32> poison, 
i32 64, i32 %arg, i32 68, i32 3)
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 poison, <4 x i32> poison, 
i32 20, i32 %arg, i32 68, i32 3)
   call void @llvm.amdgcn.raw.tbuffer.store.i32(i32 poison, <4 x i32> poison, 
i32 56, i32 %arg, i32 68, i32 3)
@@ -49,7 +59,7 @@ declare float @llvm.amdgcn.s.buffer.load.f32(<4 x i32>, i32, 
i32 immarg) #1
 declare i32 @llvm.amdgcn.raw.buffer.load.i32(<4 x i32>, i32, i32, i32 immarg) 
#2
 declare void @llvm.amdgcn.raw.tbuffer.store.i32(i32, <4 x i32

[llvm-branch-commits] [llvm] [AMDGPU][SIFoldOperands] Fold some redundant bitmasks (PR #131311)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh ready_for_review 
https://github.com/llvm/llvm-project/pull/131311
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Replace unused update.dpp inputs with poison instead of undef (PR #131287)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> My concern is that it is not obvious when the semantics allow it, when you 
> have a plethora of undocumented target intrinsics. I guess the grown-up 
> solution is to document them properly.

Yes, the intrinsics should have the edge cases all specified. I also assume 
this is why you can specify noundef on specific parameters 

https://github.com/llvm/llvm-project/pull/131287
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Replace some test i32 undef uses with poison (PR #131092)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/131092

>From bebdc38263f50c285f66899f61d1b5ad19f0d48b Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 13 Mar 2025 14:46:17 +0700
Subject: [PATCH] AMDGPU: Replace some test i32 undef uses with poison

---
 .../CodeGen/AMDGPU/cgp-addressing-modes.ll|  4 ++--
 llvm/test/CodeGen/AMDGPU/commute-shifts.ll|  2 +-
 .../AMDGPU/constant-address-space-32bit.ll|  2 +-
 .../CodeGen/AMDGPU/dagcombine-fma-fmad.ll |  2 +-
 .../AMDGPU/extract_subvector_vec4_vec3.ll |  4 ++--
 llvm/test/CodeGen/AMDGPU/img-nouse-adjust.ll  |  2 +-
 .../AMDGPU/indirect-call-known-callees.ll |  2 +-
 .../ipra-return-address-save-restore.ll   |  2 +-
 .../CodeGen/AMDGPU/llvm.amdgcn.exp.prim.ll|  2 +-
 .../CodeGen/AMDGPU/llvm.amdgcn.exp.row.ll | 14 ++---
 amdgcn.struct.buffer.load.format.v3f16.ll |  2 +-
 ...gcn.struct.ptr.buffer.load.format.v3f16.ll |  2 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.wqm.ll   |  2 +-
 .../test/CodeGen/AMDGPU/loop_exit_with_xor.ll |  6 +++---
 llvm/test/CodeGen/AMDGPU/merge-load-store.mir | 12 +--
 .../mubuf-shader-vgpr-non-ptr-intrinsics.ll   |  6 +++---
 llvm/test/CodeGen/AMDGPU/mubuf-shader-vgpr.ll |  6 +++---
 ...al-regcopy-and-spill-missed-at-regalloc.ll |  2 +-
 .../AMDGPU/scheduler-subrange-crash.ll| 10 +-
 llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll |  2 +-
 .../AMDGPU/sgpr-spill-wrong-stack-id.mir  |  2 +-
 .../test/CodeGen/AMDGPU/simplify-libcalls2.ll |  2 +-
 .../AMDGPU/splitkit-getsubrangeformask.ll | 20 +--
 ...r-descriptor-waterfall-loop-idom-update.ll |  2 +-
 llvm/test/CodeGen/AMDGPU/wqm.ll   |  2 +-
 25 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll 
b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll
index cf82b569b4839..8243815e44358 100644
--- a/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll
+++ b/llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll
@@ -581,7 +581,7 @@ done:
 
 ; OPT-LABEL: @test_sink_local_small_offset_cmpxchg_i32(
 ; OPT: %sunkaddr = getelementptr i8, ptr addrspace(3) %in, i32 28
-; OPT: %tmp1.struct = cmpxchg ptr addrspace(3) %sunkaddr, i32 undef, i32 2 
seq_cst monotonic
+; OPT: %tmp1.struct = cmpxchg ptr addrspace(3) %sunkaddr, i32 poison, i32 2 
seq_cst monotonic
 define amdgpu_kernel void @test_sink_local_small_offset_cmpxchg_i32(ptr 
addrspace(3) %out, ptr addrspace(3) %in) {
 entry:
   %out.gep = getelementptr i32, ptr addrspace(3) %out, i32 99
@@ -591,7 +591,7 @@ entry:
   br i1 %tmp0, label %endif, label %if
 
 if:
-  %tmp1.struct = cmpxchg ptr addrspace(3) %in.gep, i32 undef, i32 2 seq_cst 
monotonic
+  %tmp1.struct = cmpxchg ptr addrspace(3) %in.gep, i32 poison, i32 2 seq_cst 
monotonic
   %tmp1 = extractvalue { i32, i1 } %tmp1.struct, 0
   br label %endif
 
diff --git a/llvm/test/CodeGen/AMDGPU/commute-shifts.ll 
b/llvm/test/CodeGen/AMDGPU/commute-shifts.ll
index 2930c6efd02b7..820ccb18a2b3d 100644
--- a/llvm/test/CodeGen/AMDGPU/commute-shifts.ll
+++ b/llvm/test/CodeGen/AMDGPU/commute-shifts.ll
@@ -30,7 +30,7 @@ define amdgpu_ps float @main(float %arg0, float %arg1) #0 {
 ; VI-NEXT:; return to shader part epilog
 bb:
   %tmp = fptosi float %arg0 to i32
-  %tmp1 = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 
undef, <8 x i32> poison, i32 0, i32 0)
+  %tmp1 = call <4 x float> @llvm.amdgcn.image.load.1d.v4f32.i32(i32 15, i32 
poison, <8 x i32> poison, i32 0, i32 0)
   %tmp2.f = extractelement <4 x float> %tmp1, i32 0
   %tmp2 = bitcast float %tmp2.f to i32
   %tmp3 = and i32 %tmp, 7
diff --git a/llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll 
b/llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
index 4f3cff4b32ea3..d8fb2641c8192 100644
--- a/llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
+++ b/llvm/test/CodeGen/AMDGPU/constant-address-space-32bit.ll
@@ -306,7 +306,7 @@ define amdgpu_vs float @load_addr_no_fold(ptr addrspace(6) 
inreg noalias %p0) #0
 define amdgpu_vs float @vgpr_arg_src(ptr addrspace(6) %arg) {
 main_body:
   %tmp9 = load ptr addrspace(8), ptr addrspace(6) %arg
-  %tmp10 = call nsz float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr 
addrspace(8) %tmp9, i32 undef, i32 0, i32 0, i32 0) #1
+  %tmp10 = call nsz float @llvm.amdgcn.struct.ptr.buffer.load.format.f32(ptr 
addrspace(8) %tmp9, i32 poison, i32 0, i32 0, i32 0) #1
   ret float %tmp10
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll 
b/llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
index 03b9f9bf82f3c..e285689dea58a 100644
--- a/llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
+++ b/llvm/test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll
@@ -204,7 +204,7 @@ define amdgpu_ps float @_amdgpu_ps_main() #0 {
   %40 = fmul reassoc nnan nsz arcp contract afn float %39, 0x3F847AE14000
   %41 = fadd reassoc nnan nsz arcp contract afn float %40, 0x3F947AE14000
   %.i2415 = fmul

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

Pierre-vh wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131312** https://app.graphite.dev/github/pr/llvm/llvm-project/131312?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131311** https://app.graphite.dev/github/pr/llvm/llvm-project/131311?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131310** https://app.graphite.dev/github/pr/llvm/llvm-project/131310?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131309** https://app.graphite.dev/github/pr/llvm/llvm-project/131309?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131308** https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131308?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131307** https://app.graphite.dev/github/pr/llvm/llvm-project/131307?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131306** https://app.graphite.dev/github/pr/llvm/llvm-project/131306?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131305** https://app.graphite.dev/github/pr/llvm/llvm-project/131305?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131308
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Replace unused permlane inputs with poison instead of undef (PR #131288)

2025-03-15 Thread Nuno Lopes via llvm-branch-commits


@@ -1115,7 +1115,7 @@ GCNTTIImpl::instCombineIntrinsic(InstCombiner &IC, 
IntrinsicInst &II) const {
   case Intrinsic::amdgcn_permlanex16_var: {
 // Discard vdst_in if it's not going to be read.
 Value *VDstIn = II.getArgOperand(0);
-if (isa(VDstIn))
+if (isa(VDstIn))

nunoplopes wrote:

I don't know how prevalent this intrinsic is, but I would maybe leave this one 
as-is to preserve compatibility with older IR. We can always do a search & 
replace later when we get rid of undef.
The rest LGTM, thanks!

https://github.com/llvm/llvm-project/pull/131288
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Replace ptr addrspace(8) undef uses with poison (PR #130904)

2025-03-15 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/130904

>From fa3c82be14f0e94ea7e1a33c167968c7379f2563 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 12 Mar 2025 13:24:50 +0700
Subject: [PATCH] AMDGPU: Replace ptr addrspace(8) undef uses with poison

---
 llvm/test/CodeGen/AMDGPU/amdpal.ll|   2 +-
 .../CodeGen/AMDGPU/combine-add-zext-xor.ll|  12 +-
 llvm/test/CodeGen/AMDGPU/else.ll  |   2 +-
 .../AMDGPU/extract_subvector_vec4_vec3.ll |   8 +-
 llvm/test/CodeGen/AMDGPU/llvm.amdgcn.exp.ll   |   2 +-
 .../llvm.amdgcn.raw.ptr.buffer.atomic.ll  |   2 +-
 .../llvm.amdgcn.struct.ptr.buffer.atomic.ll   |   2 +-
 .../test/CodeGen/AMDGPU/loop_exit_with_xor.ll |   6 +-
 .../lower-work-group-id-intrinsics-hsa.ll |   2 +-
 .../lower-work-group-id-intrinsics-pal.ll |   2 +-
 llvm/test/CodeGen/AMDGPU/merge-store-crash.ll |   2 +-
 .../test/CodeGen/AMDGPU/merge-store-usedef.ll |   2 +-
 .../AMDGPU/required-export-priority.ll|   2 +-
 .../AMDGPU/si-triv-disjoint-mem-access.ll |   2 +-
 llvm/test/CodeGen/AMDGPU/wave32.ll|   8 +-
 llvm/test/CodeGen/AMDGPU/wqm.ll   | 110 +-
 16 files changed, 83 insertions(+), 83 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/amdpal.ll 
b/llvm/test/CodeGen/AMDGPU/amdpal.ll
index 171df029615ed..fd9227d2f4319 100644
--- a/llvm/test/CodeGen/AMDGPU/amdpal.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdpal.ll
@@ -72,7 +72,7 @@ entry:
   %e = getelementptr [2 x i32], ptr addrspace(5) %v1, i32 0, i32 %idx
   %x = load i32, ptr addrspace(5) %e
   %xf = bitcast i32 %x to float
-  call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %xf, ptr addrspace(8) 
undef, i32 0, i32 0, i32 0)
+  call void @llvm.amdgcn.raw.ptr.buffer.store.f32(float %xf, ptr addrspace(8) 
poison, i32 0, i32 0, i32 0)
   ret void
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll 
b/llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
index b42542db6dbd8..f8227f0039af7 100644
--- a/llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
+++ b/llvm/test/CodeGen/AMDGPU/combine-add-zext-xor.ll
@@ -66,7 +66,7 @@ define i32 @combine_add_zext_xor() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
@@ -146,7 +146,7 @@ define i32 @combine_sub_zext_xor() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
@@ -229,7 +229,7 @@ define i32 @combine_add_zext_or() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
@@ -313,7 +313,7 @@ define i32 @combine_sub_zext_or() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
@@ -392,7 +392,7 @@ define i32 @combine_add_zext_and() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
@@ -471,7 +471,7 @@ define i32 @combine_sub_zext_and() {
   br i1 undef, label %bb9, label %bb
 
 bb:   ; preds = %.a
-  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) undef, 
i32 %.2, i32 64, i32 1)
+  %.i3 = call i32 @llvm.amdgcn.raw.ptr.buffer.load.i32(ptr addrspace(8) 
poison, i32 %.2, i32 64, i32 1)
   %i5 = icmp eq i32 %.i3, 0
   br label %bb9
 
diff --git a/llvm/test/CodeGen/AMDGPU/else.ll b/llvm/test/CodeGen/AMDGPU/else.ll
index 4a3018e67b17d..884f5305407a1 100644
--- a/llvm/test/CodeGen/AMDGPU/else.ll
+++ b/llvm/test/CodeGen/AMDGPU/else.ll
@@ -47,7 +47,7 @@ else:
 
 end:
   %r = phi float [ %v.if, %if ], [ %v.else, %else ]
-  call void @llvm.amdgcn.raw.ptr.buffer.store.f32(f

[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117037

>From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 12 Mar 2025 23:30:01 +
Subject: [PATCH] Fix EOF newlines.

Created using spr 1.3.6-beta.1
---
 clang/test/Driver/call-graph-section.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Driver/call-graph-section.c 
b/clang/test/Driver/call-graph-section.c
index 108446729d857..5832aa6754137 100644
--- a/clang/test/Driver/call-graph-section.c
+++ b/clang/test/Driver/call-graph-section.c
@@ -2,4 +2,4 @@
 // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | 
FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s
 
 // CALL-GRAPH-SECTION: "-fcall-graph-section"
-// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"
\ No newline at end of file
+// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang][HeuristicResolver] Default argument heuristic for template parameters (PR #131074)

2025-03-15 Thread Nathan Ridge via llvm-branch-commits

https://github.com/HighCommander4 created 
https://github.com/llvm/llvm-project/pull/131074

Fixes https://github.com/clangd/clangd/discussions/1056

>From 32ca27b5daa8cd1a0a9ad7b60c0ceecebaf9e8b6 Mon Sep 17 00:00:00 2001
From: Nathan Ridge 
Date: Thu, 13 Mar 2025 01:23:03 -0400
Subject: [PATCH] [clang][HeuristicResolver] Default argument heuristic for
 template parameters

---
 clang/lib/Sema/HeuristicResolver.cpp   | 15 +++
 clang/unittests/Sema/HeuristicResolverTest.cpp | 17 +
 2 files changed, 32 insertions(+)

diff --git a/clang/lib/Sema/HeuristicResolver.cpp 
b/clang/lib/Sema/HeuristicResolver.cpp
index d377379c627db..feda9696b8e05 100644
--- a/clang/lib/Sema/HeuristicResolver.cpp
+++ b/clang/lib/Sema/HeuristicResolver.cpp
@@ -11,6 +11,7 @@
 #include "clang/AST/CXXInheritance.h"
 #include "clang/AST/DeclTemplate.h"
 #include "clang/AST/ExprCXX.h"
+#include "clang/AST/TemplateBase.h"
 #include "clang/AST/Type.h"
 
 namespace clang {
@@ -125,6 +126,20 @@ TagDecl 
*HeuristicResolverImpl::resolveTypeToTagDecl(QualType QT) {
   if (!T)
 return nullptr;
 
+  // If T is the type of a template parameter, we can't get a useful TagDecl
+  // out of it. However, if the template parameter has a default argument,
+  // as a heuristic we can replace T with the default argument type.
+  if (const auto *TTPT = dyn_cast(T)) {
+if (const auto *TTPD = TTPT->getDecl()) {
+  if (TTPD->hasDefaultArgument()) {
+const auto &DefaultArg = TTPD->getDefaultArgument().getArgument();
+if (DefaultArg.getKind() == TemplateArgument::Type) {
+  T = DefaultArg.getAsType().getTypePtrOrNull();
+}
+  }
+}
+  }
+
   // Unwrap type sugar such as type aliases.
   T = T->getCanonicalTypeInternal().getTypePtr();
 
diff --git a/clang/unittests/Sema/HeuristicResolverTest.cpp 
b/clang/unittests/Sema/HeuristicResolverTest.cpp
index c7cfe7917c532..5e36108172702 100644
--- a/clang/unittests/Sema/HeuristicResolverTest.cpp
+++ b/clang/unittests/Sema/HeuristicResolverTest.cpp
@@ -410,6 +410,23 @@ TEST(HeuristicResolver, MemberExpr_HangIssue126536) {
   cxxDependentScopeMemberExpr(hasMemberName("foo")).bind("input"));
 }
 
+TEST(HeuristicResolver, MemberExpr_DefaultTemplateArgument) {
+  std::string Code = R"cpp(
+struct Default {
+  void foo();
+};
+template 
+void bar(T t) {
+  t.foo();
+}
+  )cpp";
+  // Test resolution of "foo" in "t.foo()".
+  expectResolution(
+  Code, &HeuristicResolver::resolveMemberExpr,
+  cxxDependentScopeMemberExpr(hasMemberName("foo")).bind("input"),
+  cxxMethodDecl(hasName("foo")).bind("output"));
+}
+
 TEST(HeuristicResolver, DeclRefExpr_StaticMethod) {
   std::string Code = R"cpp(
 template 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang][CodeGen] Do not promote if complex divisor is real (PR #131451)

2025-03-15 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Mészáros Gergely (Maetveis)


Changes

Relates-to: https://github.com/llvm/llvm-project/issues/131129

---
Full diff: https://github.com/llvm/llvm-project/pull/131451.diff


2 Files Affected:

- (modified) clang/lib/CodeGen/CGExprComplex.cpp (+8-9) 
- (modified) clang/test/CodeGen/cx-complex-range-real.c (+16-36) 


``diff
diff --git a/clang/lib/CodeGen/CGExprComplex.cpp 
b/clang/lib/CodeGen/CGExprComplex.cpp
index ff7c55be246cc..a8a65a2f956f8 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -286,8 +286,7 @@ class ComplexExprEmitter
   ComplexPairTy EmitComplexBinOpLibCall(StringRef LibCallName,
 const BinOpInfo &Op);
 
-  QualType HigherPrecisionTypeForComplexArithmetic(QualType ElementType,
-   bool IsDivOpCode) {
+  QualType HigherPrecisionTypeForComplexArithmetic(QualType ElementType) {
 ASTContext &Ctx = CGF.getContext();
 const QualType HigherElementType =
 Ctx.GetHigherPrecisionFPType(ElementType);
@@ -314,7 +313,7 @@ class ComplexExprEmitter
   }
 
   QualType getPromotionType(FPOptionsOverride Features, QualType Ty,
-bool IsDivOpCode = false) {
+bool IsComplexDivisor = false) {
 if (auto *CT = Ty->getAs()) {
   QualType ElementType = CT->getElementType();
   bool IsFloatingType = ElementType->isFloatingType();
@@ -325,10 +324,9 @@ class ComplexExprEmitter
  Features.getComplexRangeOverride() ==
  CGF.getLangOpts().getComplexRange();
 
-  if (IsDivOpCode && IsFloatingType && IsComplexRangePromoted &&
+  if (IsComplexDivisor && IsFloatingType && IsComplexRangePromoted &&
   (HasNoComplexRangeOverride || HasMatchingComplexRange))
-return HigherPrecisionTypeForComplexArithmetic(ElementType,
-   IsDivOpCode);
+return HigherPrecisionTypeForComplexArithmetic(ElementType);
   if (ElementType.UseExcessPrecision(CGF.getContext()))
 return CGF.getContext().getComplexType(CGF.getContext().FloatTy);
 }
@@ -339,9 +337,10 @@ class ComplexExprEmitter
 
 #define HANDLEBINOP(OP)
\
   ComplexPairTy VisitBin##OP(const BinaryOperator *E) {
\
-QualType promotionTy = getPromotionType(   
\
-E->getStoredFPFeaturesOrDefault(), E->getType(),   
\
-(E->getOpcode() == BinaryOperatorKind::BO_Div) ? true : false);
\
+QualType promotionTy = 
\
+getPromotionType(E->getStoredFPFeaturesOrDefault(), E->getType(),  
\
+ (E->getOpcode() == BinaryOperatorKind::BO_Div &&  
\
+  E->getRHS()->getType()->isAnyComplexType()));
\
 ComplexPairTy result = EmitBin##OP(EmitBinOps(E, promotionTy));
\
 if (!promotionTy.isNull()) 
\
   result = CGF.EmitUnPromotedValue(result, E->getType());  
\
diff --git a/clang/test/CodeGen/cx-complex-range-real.c 
b/clang/test/CodeGen/cx-complex-range-real.c
index 1723075be30fd..94bc080d190bc 100644
--- a/clang/test/CodeGen/cx-complex-range-real.c
+++ b/clang/test/CodeGen/cx-complex-range-real.c
@@ -591,18 +591,13 @@ _Complex float mulbf(float a, _Complex float b) {
 // PRMTD-NEXT:[[A_REAL:%.*]] = load float, ptr [[A_REALP]], align 4
 // PRMTD-NEXT:[[A_IMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[A]], i32 0, i32 1
 // PRMTD-NEXT:[[A_IMAG:%.*]] = load float, ptr [[A_IMAGP]], align 4
-// PRMTD-NEXT:[[EXT:%.*]] = fpext float [[A_REAL]] to double
-// PRMTD-NEXT:[[EXT1:%.*]] = fpext float [[A_IMAG]] to double
 // PRMTD-NEXT:[[TMP0:%.*]] = load float, ptr [[B_ADDR]], align 4
-// PRMTD-NEXT:[[EXT2:%.*]] = fpext float [[TMP0]] to double
-// PRMTD-NEXT:[[TMP1:%.*]] = fdiv double [[EXT]], [[EXT2]]
-// PRMTD-NEXT:[[TMP2:%.*]] = fdiv double [[EXT1]], [[EXT2]]
-// PRMTD-NEXT:[[UNPROMOTION:%.*]] = fptrunc double [[TMP1]] to float
-// PRMTD-NEXT:[[UNPROMOTION3:%.*]] = fptrunc double [[TMP2]] to float
+// PRMTD-NEXT:[[TMP1:%.*]] = fdiv float [[A_REAL]], [[TMP0]]
+// PRMTD-NEXT:[[TMP2:%.*]] = fdiv float [[A_IMAG]], [[TMP0]]
 // PRMTD-NEXT:[[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { float, 
float }, ptr [[RETVAL]], i32 0, i32 0
 // PRMTD-NEXT:[[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { float, 
float }, ptr [[RETVAL]], i32 0, i32 1
-// PRMTD-NEXT:store float [[UNPROMOTION]], ptr [[RETVAL_REALP]], align 4
-// PRMTD-NEXT:store float [[UNPROMOTION3]], ptr [[RETVAL_IMAGP]], align 4
+// PRMTD-NEXT:store float [

[llvm-branch-commits] [compiler-rt] [llvm] [ctxprof] Make ContextRoot an implementation detail (PR #131416)

2025-03-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/131416

>From e6d651d645a5510011f9f90e28e812e5bb46f64f Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 13 Mar 2025 20:46:45 -0700
Subject: [PATCH] [ctxprof] Make ContextRoot an implementation detail

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 25 --
 .../lib/ctx_profile/CtxInstrProfiling.h   | 26 +-
 .../tests/CtxInstrProfilingTest.cpp   | 30 ++--
 .../Instrumentation/PGOCtxProfLowering.cpp| 49 +++
 .../PGOProfile/ctx-instrumentation.ll | 24 -
 5 files changed, 82 insertions(+), 72 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 1c2cad1ca506e..6ef7076d93e31 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -336,10 +336,28 @@ void setupContext(ContextRoot *Root, GUID Guid, uint32_t 
NumCounters,
   AllContextRoots.PushBack(Root);
 }
 
+ContextRoot *FunctionData::getOrAllocateContextRoot() {
+  auto *Root = CtxRoot;
+  if (!Root) {
+__sanitizer::GenericScopedLock<__sanitizer::StaticSpinMutex> L(&Mutex);
+Root = CtxRoot;
+if (!Root) {
+  Root =
+  new (__sanitizer::InternalAlloc(sizeof(ContextRoot))) ContextRoot();
+  CtxRoot = Root;
+}
+  }
+  assert(Root);
+  return Root;
+}
+
 ContextNode *__llvm_ctx_profile_start_context(
-ContextRoot *Root, GUID Guid, uint32_t Counters,
+FunctionData *FData, GUID Guid, uint32_t Counters,
 uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = true;
+
+  auto *Root = FData->getOrAllocateContextRoot();
+
   __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
 __sanitizer::memory_order_relaxed);
 
@@ -356,12 +374,13 @@ ContextNode *__llvm_ctx_profile_start_context(
   return TheScratchContext;
 }
 
-void __llvm_ctx_profile_release_context(ContextRoot *Root)
+void __llvm_ctx_profile_release_context(FunctionData *FData)
 SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = false;
   if (__llvm_ctx_profile_current_context_root) {
 __llvm_ctx_profile_current_context_root = nullptr;
-Root->Taken.Unlock();
+assert(FData->CtxRoot);
+FData->CtxRoot->Taken.Unlock();
   }
 }
 
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
index 72cc60bf523e1..6bb954da950c4 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
@@ -84,7 +84,6 @@ struct ContextRoot {
   // Count the number of entries - regardless if we could take the `Taken` 
mutex
   ::__sanitizer::atomic_uint64_t TotalEntries = {};
 
-  // This is init-ed by the static zero initializer in LLVM.
   // Taken is used to ensure only one thread traverses the contextual graph -
   // either to read it or to write it. On server side, the same entrypoint will
   // be entered by numerous threads, but over time, the profile aggregated by
@@ -109,12 +108,7 @@ struct ContextRoot {
   // or with more concurrent collections (==more memory) and less collection
   // time. Note that concurrent collection does happen for different
   // entrypoints, regardless.
-  ::__sanitizer::StaticSpinMutex Taken;
-
-  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
-  // instrumentation lowering side because it is responsible for allocating and
-  // zero-initializing ContextRoots.
-  static_assert(sizeof(Taken) == 1);
+  ::__sanitizer::SpinMutex Taken;
 };
 
 // This is allocated and zero-initialized by the compiler, the in-place
@@ -139,8 +133,16 @@ struct FunctionData {
   FunctionData() { Mutex.Init(); }
 
   FunctionData *Next = nullptr;
+  ContextRoot *volatile CtxRoot = nullptr;
   ContextNode *volatile FlatCtx = nullptr;
+
+  ContextRoot *getOrAllocateContextRoot();
+
   ::__sanitizer::StaticSpinMutex Mutex;
+  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
+  // instrumentation lowering side because it is responsible for allocating and
+  // zero-initializing ContextRoots.
+  static_assert(sizeof(Mutex) == 1);
 };
 
 /// This API is exposed for testing. See the APIs below about the contract with
@@ -172,17 +174,17 @@ extern __thread __ctx_profile::ContextRoot
 
 /// called by LLVM in the entry BB of a "entry point" function. The returned
 /// pointer may be "tainted" - its LSB set to 1 - to indicate it's scratch.
-ContextNode *__llvm_ctx_profile_start_context(__ctx_profile::ContextRoot *Root,
-  GUID Guid, uint32_t Counters,
-  uint32_t Callsites);
+ContextNode *
+__llvm_ctx_profile_start_context(__ctx_profile::FunctionData *FData, GUID Guid,
+ uint32_t Counters, uint32_t Callsites);
 
 /// paired with __

[llvm-branch-commits] [llvm] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG (PR #131308)

2025-03-15 Thread Pierre van Houtryve via llvm-branch-commits

https://github.com/Pierre-vh updated 
https://github.com/llvm/llvm-project/pull/131308

>From e6862b4528d1ed48bbca9e742dd9a96d8777545b Mon Sep 17 00:00:00 2001
From: pvanhout 
Date: Wed, 5 Mar 2025 13:41:04 +0100
Subject: [PATCH 1/2] [AMDGPU][Legalizer] Widen i16 G_SEXT_INREG

It's better to widen them to avoid it being lowered into a G_ASHR + G_SHL. With 
this change we just extend to i32 then trunc the result.
---
 .../lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp |   3 +-
 llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll   |   7 +-
 .../AMDGPU/GlobalISel/legalize-abs.mir|   8 +-
 .../AMDGPU/GlobalISel/legalize-ashr.mir   |  20 +--
 .../AMDGPU/GlobalISel/legalize-sext-inreg.mir | 155 +++---
 .../AMDGPU/GlobalISel/legalize-sext.mir   | 101 ++--
 .../AMDGPU/GlobalISel/legalize-smax.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smin.mir   |  33 +++-
 .../AMDGPU/GlobalISel/legalize-smulh.mir  | 132 +++
 .../CodeGen/AMDGPU/GlobalISel/llvm.abs.ll |  45 ++---
 .../CodeGen/AMDGPU/GlobalISel/sext_inreg.ll   | 130 ++-
 11 files changed, 299 insertions(+), 368 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
index b3a8183beeacf..6e611ebb4b625 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
@@ -2009,7 +2009,8 @@ AMDGPULegalizerInfo::AMDGPULegalizerInfo(const 
GCNSubtarget &ST_,
   // S64 is only legal on SALU, and needs to be broken into 32-bit elements in
   // RegBankSelect.
   auto &SextInReg = getActionDefinitionsBuilder(G_SEXT_INREG)
-.legalFor({{S32}, {S64}});
+.legalFor({{S32}, {S64}})
+.widenScalarIf(typeIs(0, S16), widenScalarOrEltToNextPow2(0, 32));
 
   if (ST.hasVOP3PInsts()) {
 SextInReg.lowerFor({{V2S16}})
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
index 493e8cef63890..f81d7f1c300b8 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ashr.ll
@@ -17,8 +17,7 @@ define i8 @v_ashr_i8(i8 %value, i8 %amount) {
 ; GFX8-LABEL: v_ashr_i8:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_1
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8:
@@ -49,8 +48,8 @@ define i8 @v_ashr_i8_7(i8 %value) {
 ; GFX8-LABEL: v_ashr_i8_7:
 ; GFX8:   ; %bb.0:
 ; GFX8-NEXT:s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:v_lshlrev_b16_e32 v0, 8, v0
-; GFX8-NEXT:v_ashrrev_i16_e32 v0, 15, v0
+; GFX8-NEXT:v_mov_b32_e32 v1, 7
+; GFX8-NEXT:v_ashrrev_i16_sdwa v0, v1, sext(v0) dst_sel:DWORD 
dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_0
 ; GFX8-NEXT:s_setpc_b64 s[30:31]
 ;
 ; GFX9-LABEL: v_ashr_i8_7:
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
index a9fe80eb47e76..2b911b2dce697 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-abs.mir
@@ -144,11 +144,9 @@ body: |
 ; VI: liveins: $vgpr0
 ; VI-NEXT: {{  $}}
 ; VI-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
-; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC]], [[C]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C]](s16)
-; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[ASHR]]
+; VI-NEXT: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY]], 8
+; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[SEXT_INREG]](s32)
+; VI-NEXT: [[ABS:%[0-9]+]]:_(s16) = G_ABS [[TRUNC]]
 ; VI-NEXT: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ABS]](s16)
 ; VI-NEXT: $vgpr0 = COPY [[ANYEXT]](s32)
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir 
b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
index f4aaab745e03b..53905a2f49dd0 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-ashr.mir
@@ -319,12 +319,10 @@ body: |
 ; VI-NEXT: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[COPY1]](s32)
 ; VI-NEXT: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
 ; VI-NEXT: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
-; VI-NEXT: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[COPY]](s32)
-; VI-NEXT: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
-; VI-NEXT: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[TRUNC1]], [[C1]](s16)
-; VI-NEXT: [[ASHR:%[0-9]+]]:_(s16) = G_ASHR [[SHL]], [[C1]](s16)
-; VI-NEXT: [[ASHR1:%[0-9]+]]:_(s16) = G_ASHR [[ASHR]], [

[llvm-branch-commits] [mlir] 07f03ef - Revert "[mlir][xegpu] Add XeGPU subgroup map propagation analysis for XeGPU S…"

2025-03-15 Thread via llvm-branch-commits

Author: Charitha Saumya
Date: 2025-03-14T10:32:05-07:00
New Revision: 07f03ef6ab2a35ad904a62a85c45c4f7644d7e75

URL: 
https://github.com/llvm/llvm-project/commit/07f03ef6ab2a35ad904a62a85c45c4f7644d7e75
DIFF: 
https://github.com/llvm/llvm-project/commit/07f03ef6ab2a35ad904a62a85c45c4f7644d7e75.diff

LOG: Revert "[mlir][xegpu] Add XeGPU subgroup map propagation analysis for 
XeGPU S…"

This reverts commit 5eb557774df637c9e581bd3008cfc6d156a61902.

Added: 


Modified: 
mlir/include/mlir/Dialect/XeGPU/Transforms/Passes.td
mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt

Removed: 
mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
mlir/test/Dialect/XeGPU/subgroup-map-propagation.mlir



diff  --git a/mlir/include/mlir/Dialect/XeGPU/Transforms/Passes.td 
b/mlir/include/mlir/Dialect/XeGPU/Transforms/Passes.td
index 3e81f2d0ed786..1ecd6ce95322b 100644
--- a/mlir/include/mlir/Dialect/XeGPU/Transforms/Passes.td
+++ b/mlir/include/mlir/Dialect/XeGPU/Transforms/Passes.td
@@ -23,19 +23,4 @@ def XeGPUFoldAliasOps : Pass<"xegpu-fold-alias-ops"> {
   ];
 }
 
-def XeGPUSubgroupDistribute : Pass<"xegpu-subgroup-distribute"> {
-  let summary = "Distribute XeGPU ops to work items";
-  let description = [{
-The pass distributes subgroup level (SIMD) XeGPU ops to work items.
-  }];
-  let dependentDialects = [
-  "memref::MemRefDialect", "xegpu::XeGPUDialect", "vector::VectorDialect"
-  ];
-  let options = [
-Option<"printOnly", "print-analysis-only", "bool",
- /*default=*/"false",
- "Print the result of the subgroup map propagation analysis and exit.">
-  ];
-}
-
 #endif // MLIR_DIALECT_XEGPU_TRANSFORMS_PASSES_TD

diff  --git a/mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt 
b/mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt
index 124e904edb543..7fb64d3b97b87 100644
--- a/mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/XeGPU/Transforms/CMakeLists.txt
@@ -1,6 +1,5 @@
 add_mlir_dialect_library(MLIRXeGPUTransforms
   XeGPUFoldAliasOps.cpp
-  XeGPUSubgroupDistribute.cpp
 
   ADDITIONAL_HEADER_DIRS
   ${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/XeGPU

diff  --git a/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp 
b/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
deleted file mode 100644
index 86e07697f437c..0
--- a/mlir/lib/Dialect/XeGPU/Transforms/XeGPUSubgroupDistribute.cpp
+++ /dev/null
@@ -1,662 +0,0 @@
-//===- XeGPUSubgroupDistribute.cpp - XeGPU Subgroup Distribute Pass 
---===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===--===//
-#include "mlir/Analysis/DataFlow/ConstantPropagationAnalysis.h"
-#include "mlir/Analysis/DataFlow/DeadCodeAnalysis.h"
-#include "mlir/Analysis/DataFlow/SparseAnalysis.h"
-#include "mlir/Analysis/DataFlowFramework.h"
-#include "mlir/Dialect/GPU/IR/GPUDialect.h"
-#include "mlir/Dialect/MemRef/IR/MemRef.h"
-#include "mlir/Dialect/Vector/IR/VectorOps.h"
-#include "mlir/Dialect/XeGPU/IR/XeGPU.h"
-#include "mlir/Dialect/XeGPU/Transforms/Passes.h"
-#include "mlir/Dialect/XeGPU/Transforms/Transforms.h"
-#include "mlir/IR/Builders.h"
-#include "mlir/Interfaces/FunctionInterfaces.h"
-#include "llvm/ADT/TypeSwitch.h"
-#include "llvm/Support/raw_ostream.h"
-
-namespace mlir {
-namespace xegpu {
-#define GEN_PASS_DEF_XEGPUSUBGROUPDISTRIBUTE
-#include "mlir/Dialect/XeGPU/Transforms/Passes.h.inc"
-} // namespace xegpu
-} // namespace mlir
-
-using namespace mlir;
-using namespace mlir::dataflow;
-
-/// HW dependent constants.
-/// TODO: These constants should be queried from the target information.
-constexpr unsigned subgroupSize = 16; // How many work items in a subgroup.
-/// If DPAS A or B operands have low precision element types they must be 
packed
-/// according to the following sizes.
-constexpr unsigned packedSizeInBitsForDefault =
-16; // Minimum packing size per register for DPAS A.
-constexpr unsigned packedSizeInBitsForDpasB =
-32; // Minimum packing size per register for DPAS B.
-
-namespace {
-
-///===--===///
-/// Layout
-///===--===///
-
-/// Helper class to store the ND layout of work items within a subgroup and 
data
-/// owned by each work item.
-struct Layout {
-  SmallVector layout;
-  Layout() = default;
-  Layout(const Layout &other) = default;
-  Layout(std::initializer_list list) : layout(list) {}
-  void print(llvm::raw_ostream &os) const;
-  size_t size() const { return layout.size(); }
-  int64_t operator[](size_t idx) const;
-};
-
-void Layout::print(llvm::raw_ostream &os) const {
-  os << "[";
-  llvm

[llvm-branch-commits] [clang] [clang] Introduce CallGraphSection option (PR #117037)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/117037

>From 6a12be2c5b60a95a06875b0b2c4f14228d1fa882 Mon Sep 17 00:00:00 2001
From: prabhukr 
Date: Wed, 12 Mar 2025 23:30:01 +
Subject: [PATCH] Fix EOF newlines.

Created using spr 1.3.6-beta.1
---
 clang/test/Driver/call-graph-section.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/clang/test/Driver/call-graph-section.c 
b/clang/test/Driver/call-graph-section.c
index 108446729d857..5832aa6754137 100644
--- a/clang/test/Driver/call-graph-section.c
+++ b/clang/test/Driver/call-graph-section.c
@@ -2,4 +2,4 @@
 // RUN: %clang -### -S -fcall-graph-section -fno-call-graph-section %s 2>&1 | 
FileCheck --check-prefix=NO-CALL-GRAPH-SECTION %s
 
 // CALL-GRAPH-SECTION: "-fcall-graph-section"
-// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"
\ No newline at end of file
+// NO-CALL-GRAPH-SECTION-NOT: "-fcall-graph-section"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NFC][Cloning] Move DebugInfoFinder decl closer to its place of usage (PR #129154)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129154

>From dab28ad0a6c11e2f1c2812feb688dd0c6a562de7 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 13:09:23 -0800
Subject: [PATCH] [NFC][Cloning] Move DebugInfoFinder decl closer to its place
 of usage

Summary:
This makes it clear that DIFinder is only really necessary for llvm.dbg.cu 
update.

Test Plan:
ninja check-llvm-unit

stack-info: PR: https://github.com/llvm/llvm-project/pull/129154, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/12
---
 llvm/lib/Transforms/Utils/CloneFunction.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 979cbad0d82c0..3af07594c848b 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -266,8 +266,6 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
   if (OldFunc->isDeclaration())
 return;
 
-  DebugInfoFinder DIFinder;
-
   if (Changes < CloneFunctionChangeType::DifferentModule) {
 assert((NewFunc->getParent() == nullptr ||
 NewFunc->getParent() == OldFunc->getParent()) &&
@@ -320,7 +318,8 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
 Visited.insert(Operand);
 
   // Collect and clone all the compile units referenced from the instructions 
in
-  // the function (e.g. as a scope).
+  // the function (e.g. as instructions' scope).
+  DebugInfoFinder DIFinder;
   collectDebugInfoFromInstructions(*OldFunc, DIFinder);
   for (auto *Unit : DIFinder.compile_units()) {
 MDNode *MappedUnit =

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] Backport: [clang] fix matching of nested template template parameters (PR #130950)

2025-03-15 Thread Mike Lothian via llvm-branch-commits

FireBurn wrote:

The template issue I'm seeing bisected to:

```
28ad8978ee2054298d4198bf10c8cb68730af037 is the first bad commit
commit 28ad8978ee2054298d4198bf10c8cb68730af037
Author: Matheus Izvekov 
Date:   Thu Jan 23 20:37:33 2025 -0300

Reland: [clang] unified CWG2398 and P0522 changes; finishes implementation 
of P3310 (#124137)
```

Were you able to reproduce the issue locally?

https://github.com/llvm/llvm-project/pull/130950
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [llvm] Extract and propagate indirect call type id (PR #87575)

2025-03-15 Thread via llvm-branch-commits

https://github.com/Prabhuk updated 
https://github.com/llvm/llvm-project/pull/87575

>From 1a8d810d352fbe84c0521c7614689b60ade693c8 Mon Sep 17 00:00:00 2001
From: Necip Fazil Yildiran 
Date: Tue, 19 Nov 2024 15:25:34 -0800
Subject: [PATCH 1/4] Fixed the tests and addressed most of the review
 comments.

Created using spr 1.3.6-beta.1
---
 llvm/include/llvm/CodeGen/MachineFunction.h   | 15 +++--
 .../CodeGen/AArch64/call-site-info-typeid.ll  | 28 +++--
 .../test/CodeGen/ARM/call-site-info-typeid.ll | 28 +++--
 .../CodeGen/MIR/X86/call-site-info-typeid.ll  | 58 ---
 .../CodeGen/MIR/X86/call-site-info-typeid.mir | 13 ++---
 .../CodeGen/Mips/call-site-info-typeid.ll | 28 +++--
 .../test/CodeGen/X86/call-site-info-typeid.ll | 28 +++--
 7 files changed, 71 insertions(+), 127 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/MachineFunction.h 
b/llvm/include/llvm/CodeGen/MachineFunction.h
index bb0b87a3a04a3..44633df38a651 100644
--- a/llvm/include/llvm/CodeGen/MachineFunction.h
+++ b/llvm/include/llvm/CodeGen/MachineFunction.h
@@ -493,7 +493,7 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
 /// Callee type id.
 ConstantInt *TypeId = nullptr;
 
-CallSiteInfo() {}
+CallSiteInfo() = default;
 
 /// Extracts the numeric type id from the CallBase's type operand bundle,
 /// and sets TypeId. This is used as type id for the indirect call in the
@@ -503,12 +503,11 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
   if (!CB.isIndirectCall())
 return;
 
-  auto Opt = CB.getOperandBundle(LLVMContext::OB_type);
-  if (!Opt.has_value()) {
-errs() << "warning: cannot find indirect call type operand bundle for  
"
-  "call graph section\n";
+  std::optional Opt =
+  CB.getOperandBundle(LLVMContext::OB_type);
+  // Return if the operand bundle for call graph section cannot be found.
+  if (!Opt.has_value())
 return;
-  }
 
   // Get generalized type id string
   auto OB = Opt.value();
@@ -520,9 +519,9 @@ class LLVM_EXTERNAL_VISIBILITY MachineFunction {
  "invalid type identifier");
 
   // Compute numeric type id from generalized type id string
-  uint64_t TypeIdVal = llvm::MD5Hash(TypeIdStr->getString());
+  uint64_t TypeIdVal = MD5Hash(TypeIdStr->getString());
   IntegerType *Int64Ty = Type::getInt64Ty(CB.getContext());
-  TypeId = llvm::ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
+  TypeId = ConstantInt::get(Int64Ty, TypeIdVal, /*IsSigned=*/false);
 }
   };
 
diff --git a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll 
b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
index f0a6b44755c5c..f3b98c2c7a395 100644
--- a/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/AArch64/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that call site type ids can be extracted and set from type operand
+;; bundles.
 
-; Verify the exact typeId value to ensure it is not garbage but the value
-; computed as the type id from the type operand bundle.
-; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu %s 
-stop-before=finalize-isel -o - | FileCheck %s
-
-; ModuleID = 'test.c'
-source_filename = "test.c"
-target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
-target triple = "aarch64-unknown-linux-gnu"
+;; Verify the exact typeId value to ensure it is not garbage but the value
+;; computed as the type id from the type operand bundle.
+; RUN: llc --call-graph-section -mtriple aarch64-linux-gnu < %s 
-stop-before=finalize-isel -o - | FileCheck %s
 
 define dso_local void @foo(i8 signext %a) !type !3 {
 entry:
@@ -19,10 +14,10 @@ entry:
 define dso_local i32 @main() !type !4 {
 entry:
   %retval = alloca i32, align 4
-  %fp = alloca void (i8)*, align 8
-  store i32 0, i32* %retval, align 4
-  store void (i8)* @foo, void (i8)** %fp, align 8
-  %0 = load void (i8)*, void (i8)** %fp, align 8
+  %fp = alloca ptr, align 8
+  store i32 0, ptr %retval, align 4
+  store ptr @foo, ptr %fp, align 8
+  %0 = load ptr, ptr %fp, align 8
   ; CHECK: callSites:
   ; CHECK-NEXT: - { bb: {{.*}}, offset: {{.*}}, fwdArgRegs: [], typeId:
   ; CHECK-NEXT: 7854600665770582568 }
@@ -30,10 +25,5 @@ entry:
   ret i32 0
 }
 
-!llvm.module.flags = !{!0, !1, !2}
-
-!0 = !{i32 1, !"wchar_size", i32 4}
-!1 = !{i32 7, !"uwtable", i32 1}
-!2 = !{i32 7, !"frame-pointer", i32 2}
 !3 = !{i64 0, !"_ZTSFvcE.generalized"}
 !4 = !{i64 0, !"_ZTSFiE.generalized"}
diff --git a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll 
b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
index ec7f8a425051b..9feeef9a564cc 100644
--- a/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
+++ b/llvm/test/CodeGen/ARM/call-site-info-typeid.ll
@@ -1,14 +1,9 @@
-; Tests that call site type ids can be extracted and set from type operand
-; bundles.
+;; Tests that ca

[llvm-branch-commits] [llvm] [AMDGPU][NPM] Port GCNCreateVOPD to NPM (PR #130059)

2025-03-15 Thread Akshat Oke via llvm-branch-commits

https://github.com/optimisan updated 
https://github.com/llvm/llvm-project/pull/130059

>From 78bcc3a3576cc1f0dba5c9feb5ed781a62877ffe Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 10 Mar 2025 04:31:20 +
Subject: [PATCH 1/3] [AMDGPU][NFC] Format GCNCreateVOPD.cpp

---
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index d40a1a2a10d9b..798279b279da3 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -38,15 +38,15 @@ namespace {
 
 class GCNCreateVOPD : public MachineFunctionPass {
 private:
-class VOPDCombineInfo {
-public:
-  VOPDCombineInfo() = default;
-  VOPDCombineInfo(MachineInstr *First, MachineInstr *Second)
-  : FirstMI(First), SecondMI(Second) {}
-
-  MachineInstr *FirstMI;
-  MachineInstr *SecondMI;
-};
+  class VOPDCombineInfo {
+  public:
+VOPDCombineInfo() = default;
+VOPDCombineInfo(MachineInstr *First, MachineInstr *Second)
+: FirstMI(First), SecondMI(Second) {}
+
+MachineInstr *FirstMI;
+MachineInstr *SecondMI;
+  };
 
 public:
   static char ID;

>From ab31097bd24434b6dca9eedae15acda3a50d5fbb Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Wed, 5 Mar 2025 10:52:00 +
Subject: [PATCH 2/3] [AMDGPU][NPM] Port GCNCreateVOPD to NPM

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  7 ++-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  4 +-
 llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp  | 53 ---
 4 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 57297288eecb4..f208a8bb9964b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -358,6 +358,11 @@ class SIModeRegisterPass : public 
PassInfoMixin {
   PreservedAnalyses run(MachineFunction &F, MachineFunctionAnalysisManager 
&AM);
 };
 
+class GCNCreateVOPDPass : public PassInfoMixin {
+public:
+  PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager 
&AM);
+};
+
 FunctionPass *createAMDGPUAnnotateUniformValuesLegacy();
 
 ModulePass *createAMDGPUPrintfRuntimeBinding();
@@ -443,7 +448,7 @@ extern char &SIFormMemoryClausesID;
 void initializeSIPostRABundlerLegacyPass(PassRegistry &);
 extern char &SIPostRABundlerLegacyID;
 
-void initializeGCNCreateVOPDPass(PassRegistry &);
+void initializeGCNCreateVOPDLegacyPass(PassRegistry &);
 extern char &GCNCreateVOPDID;
 
 void initializeAMDGPUUnifyDivergentExitNodesPass(PassRegistry&);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 1050855176c04..0e3dcb4267ede 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -103,6 +103,7 @@ MACHINE_FUNCTION_PASS("amdgpu-rewrite-partial-reg-uses", 
GCNRewritePartialRegUse
 MACHINE_FUNCTION_PASS("amdgpu-pre-ra-optimizations", 
GCNPreRAOptimizationsPass())
 MACHINE_FUNCTION_PASS("amdgpu-nsa-reassign", GCNNSAReassignPass())
 MACHINE_FUNCTION_PASS("gcn-dpp-combine", GCNDPPCombinePass())
+MACHINE_FUNCTION_PASS("gcn-create-vopd", GCNCreateVOPDPass())
 MACHINE_FUNCTION_PASS("si-fix-sgpr-copies", SIFixSGPRCopiesPass())
 MACHINE_FUNCTION_PASS("si-fix-vgpr-copies", SIFixVGPRCopiesPass())
 MACHINE_FUNCTION_PASS("si-fold-operands", SIFoldOperandsPass());
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index ce3dcd920bce3..73ae9135eb319 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -546,7 +546,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSIPreAllocateWWMRegsLegacyPass(*PR);
   initializeSIFormMemoryClausesLegacyPass(*PR);
   initializeSIPostRABundlerLegacyPass(*PR);
-  initializeGCNCreateVOPDPass(*PR);
+  initializeGCNCreateVOPDLegacyPass(*PR);
   initializeAMDGPUUnifyDivergentExitNodesPass(*PR);
   initializeAMDGPUAAWrapperPassPass(*PR);
   initializeAMDGPUExternalAAWrapperPass(*PR);
@@ -2149,7 +2149,7 @@ void 
AMDGPUCodeGenPassBuilder::addPostRegAlloc(AddMachinePass &addPass) const {
 
 void AMDGPUCodeGenPassBuilder::addPreEmitPass(AddMachinePass &addPass) const {
   if (isPassEnabled(EnableVOPD, CodeGenOptLevel::Less)) {
-// TODO: addPass(GCNCreateVOPDPass());
+addPass(GCNCreateVOPDPass());
   }
   // TODO: addPass(SIMemoryLegalizerPass());
   // TODO: addPass(SIInsertWaitcntsPass());
diff --git a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp 
b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
index 798279b279da3..32a26469d616b 100644
--- a/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNCreateVOPD.cpp
@@ -27,6 +27,7 @@
 #include "llvm/CodeGen/MachineBasicBlock

[llvm-branch-commits] [llvm] [NFC][Cloning] Move DebugInfoFinder decl closer to its place of usage (PR #129154)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129154

>From 50754266de793fb34b1bed5c0d1d71c5b1e8e828 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 13:09:23 -0800
Subject: [PATCH] [NFC][Cloning] Move DebugInfoFinder decl closer to its place
 of usage

Summary:
This makes it clear that DIFinder is only really necessary for llvm.dbg.cu 
update.

Test Plan:
ninja check-llvm-unit

stack-info: PR: https://github.com/llvm/llvm-project/pull/129154, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/12
---
 llvm/lib/Transforms/Utils/CloneFunction.cpp | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index cde1ce8b43dbd..b411d4cb87fd4 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -266,8 +266,6 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
   if (OldFunc->isDeclaration())
 return;
 
-  DebugInfoFinder DIFinder;
-
   if (Changes < CloneFunctionChangeType::DifferentModule) {
 assert((NewFunc->getParent() == nullptr ||
 NewFunc->getParent() == OldFunc->getParent()) &&
@@ -320,7 +318,8 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
 Visited.insert(Operand);
 
   // Collect and clone all the compile units referenced from the instructions 
in
-  // the function (e.g. as a scope).
+  // the function (e.g. as instructions' scope).
+  DebugInfoFinder DIFinder;
   collectDebugInfoFromInstructions(*OldFunc, DIFinder);
   for (auto *Unit : DIFinder.compile_units()) {
 MDNode *MappedUnit =

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang][CodeGen] Do not promote if complex divisor is real (PR #131451)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

https://github.com/Maetveis created 
https://github.com/llvm/llvm-project/pull/131451

Relates-to: https://github.com/llvm/llvm-project/issues/131129

From 45e679eba25f309130404fe879d36bb727872b62 Mon Sep 17 00:00:00 2001
From: Gergely Meszaros 
Date: Sat, 15 Mar 2025 11:54:12 +0100
Subject: [PATCH] [Clang][CodeGen] Do not promote if complex divisor is real

Relates-to: https://github.com/llvm/llvm-project/issues/131129
---
 clang/lib/CodeGen/CGExprComplex.cpp| 17 ---
 clang/test/CodeGen/cx-complex-range-real.c | 52 +++---
 2 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/clang/lib/CodeGen/CGExprComplex.cpp 
b/clang/lib/CodeGen/CGExprComplex.cpp
index ff7c55be246cc..a8a65a2f956f8 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -286,8 +286,7 @@ class ComplexExprEmitter
   ComplexPairTy EmitComplexBinOpLibCall(StringRef LibCallName,
 const BinOpInfo &Op);
 
-  QualType HigherPrecisionTypeForComplexArithmetic(QualType ElementType,
-   bool IsDivOpCode) {
+  QualType HigherPrecisionTypeForComplexArithmetic(QualType ElementType) {
 ASTContext &Ctx = CGF.getContext();
 const QualType HigherElementType =
 Ctx.GetHigherPrecisionFPType(ElementType);
@@ -314,7 +313,7 @@ class ComplexExprEmitter
   }
 
   QualType getPromotionType(FPOptionsOverride Features, QualType Ty,
-bool IsDivOpCode = false) {
+bool IsComplexDivisor = false) {
 if (auto *CT = Ty->getAs()) {
   QualType ElementType = CT->getElementType();
   bool IsFloatingType = ElementType->isFloatingType();
@@ -325,10 +324,9 @@ class ComplexExprEmitter
  Features.getComplexRangeOverride() ==
  CGF.getLangOpts().getComplexRange();
 
-  if (IsDivOpCode && IsFloatingType && IsComplexRangePromoted &&
+  if (IsComplexDivisor && IsFloatingType && IsComplexRangePromoted &&
   (HasNoComplexRangeOverride || HasMatchingComplexRange))
-return HigherPrecisionTypeForComplexArithmetic(ElementType,
-   IsDivOpCode);
+return HigherPrecisionTypeForComplexArithmetic(ElementType);
   if (ElementType.UseExcessPrecision(CGF.getContext()))
 return CGF.getContext().getComplexType(CGF.getContext().FloatTy);
 }
@@ -339,9 +337,10 @@ class ComplexExprEmitter
 
 #define HANDLEBINOP(OP)
\
   ComplexPairTy VisitBin##OP(const BinaryOperator *E) {
\
-QualType promotionTy = getPromotionType(   
\
-E->getStoredFPFeaturesOrDefault(), E->getType(),   
\
-(E->getOpcode() == BinaryOperatorKind::BO_Div) ? true : false);
\
+QualType promotionTy = 
\
+getPromotionType(E->getStoredFPFeaturesOrDefault(), E->getType(),  
\
+ (E->getOpcode() == BinaryOperatorKind::BO_Div &&  
\
+  E->getRHS()->getType()->isAnyComplexType()));
\
 ComplexPairTy result = EmitBin##OP(EmitBinOps(E, promotionTy));
\
 if (!promotionTy.isNull()) 
\
   result = CGF.EmitUnPromotedValue(result, E->getType());  
\
diff --git a/clang/test/CodeGen/cx-complex-range-real.c 
b/clang/test/CodeGen/cx-complex-range-real.c
index 1723075be30fd..94bc080d190bc 100644
--- a/clang/test/CodeGen/cx-complex-range-real.c
+++ b/clang/test/CodeGen/cx-complex-range-real.c
@@ -591,18 +591,13 @@ _Complex float mulbf(float a, _Complex float b) {
 // PRMTD-NEXT:[[A_REAL:%.*]] = load float, ptr [[A_REALP]], align 4
 // PRMTD-NEXT:[[A_IMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[A]], i32 0, i32 1
 // PRMTD-NEXT:[[A_IMAG:%.*]] = load float, ptr [[A_IMAGP]], align 4
-// PRMTD-NEXT:[[EXT:%.*]] = fpext float [[A_REAL]] to double
-// PRMTD-NEXT:[[EXT1:%.*]] = fpext float [[A_IMAG]] to double
 // PRMTD-NEXT:[[TMP0:%.*]] = load float, ptr [[B_ADDR]], align 4
-// PRMTD-NEXT:[[EXT2:%.*]] = fpext float [[TMP0]] to double
-// PRMTD-NEXT:[[TMP1:%.*]] = fdiv double [[EXT]], [[EXT2]]
-// PRMTD-NEXT:[[TMP2:%.*]] = fdiv double [[EXT1]], [[EXT2]]
-// PRMTD-NEXT:[[UNPROMOTION:%.*]] = fptrunc double [[TMP1]] to float
-// PRMTD-NEXT:[[UNPROMOTION3:%.*]] = fptrunc double [[TMP2]] to float
+// PRMTD-NEXT:[[TMP1:%.*]] = fdiv float [[A_REAL]], [[TMP0]]
+// PRMTD-NEXT:[[TMP2:%.*]] = fdiv float [[A_IMAG]], [[TMP0]]
 // PRMTD-NEXT:[[RETVAL_REALP:%.*]] = getelementptr inbounds nuw { float, 
float }, ptr [[RETVAL]], i32 0, i32 0
 // PRMTD-NEXT:[[RETVAL_IMAGP:%.*]] = getelementptr inbounds nuw { float, 
float

[llvm-branch-commits] [clang] [Clang][CodeGen] Do not promote if complex divisor is real (PR #131451)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

Maetveis wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131451?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131451** https://app.graphite.dev/github/pr/llvm/llvm-project/131451?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131451?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131447** https://app.graphite.dev/github/pr/llvm/llvm-project/131447?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131451
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [llvm] [ctxprof] Make ContextRoot an implementation detail (PR #131416)

2025-03-15 Thread Mircea Trofin via llvm-branch-commits

https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/131416

>From f671b9be95158ad5082a88e4a924c556f5f5e930 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 13 Mar 2025 20:46:45 -0700
Subject: [PATCH] [ctxprof] Make ContextRoot an implementation detail

---
 .../lib/ctx_profile/CtxInstrProfiling.cpp | 25 --
 .../lib/ctx_profile/CtxInstrProfiling.h   | 26 +-
 .../tests/CtxInstrProfilingTest.cpp   | 30 ++--
 .../Instrumentation/PGOCtxProfLowering.cpp| 48 +++
 .../PGOProfile/ctx-instrumentation.ll | 24 +-
 5 files changed, 82 insertions(+), 71 deletions(-)

diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
index 1c2cad1ca506e..6ef7076d93e31 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.cpp
@@ -336,10 +336,28 @@ void setupContext(ContextRoot *Root, GUID Guid, uint32_t 
NumCounters,
   AllContextRoots.PushBack(Root);
 }
 
+ContextRoot *FunctionData::getOrAllocateContextRoot() {
+  auto *Root = CtxRoot;
+  if (!Root) {
+__sanitizer::GenericScopedLock<__sanitizer::StaticSpinMutex> L(&Mutex);
+Root = CtxRoot;
+if (!Root) {
+  Root =
+  new (__sanitizer::InternalAlloc(sizeof(ContextRoot))) ContextRoot();
+  CtxRoot = Root;
+}
+  }
+  assert(Root);
+  return Root;
+}
+
 ContextNode *__llvm_ctx_profile_start_context(
-ContextRoot *Root, GUID Guid, uint32_t Counters,
+FunctionData *FData, GUID Guid, uint32_t Counters,
 uint32_t Callsites) SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = true;
+
+  auto *Root = FData->getOrAllocateContextRoot();
+
   __sanitizer::atomic_fetch_add(&Root->TotalEntries, 1,
 __sanitizer::memory_order_relaxed);
 
@@ -356,12 +374,13 @@ ContextNode *__llvm_ctx_profile_start_context(
   return TheScratchContext;
 }
 
-void __llvm_ctx_profile_release_context(ContextRoot *Root)
+void __llvm_ctx_profile_release_context(FunctionData *FData)
 SANITIZER_NO_THREAD_SAFETY_ANALYSIS {
   IsUnderContext = false;
   if (__llvm_ctx_profile_current_context_root) {
 __llvm_ctx_profile_current_context_root = nullptr;
-Root->Taken.Unlock();
+assert(FData->CtxRoot);
+FData->CtxRoot->Taken.Unlock();
   }
 }
 
diff --git a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h 
b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
index 72cc60bf523e1..6bb954da950c4 100644
--- a/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
+++ b/compiler-rt/lib/ctx_profile/CtxInstrProfiling.h
@@ -84,7 +84,6 @@ struct ContextRoot {
   // Count the number of entries - regardless if we could take the `Taken` 
mutex
   ::__sanitizer::atomic_uint64_t TotalEntries = {};
 
-  // This is init-ed by the static zero initializer in LLVM.
   // Taken is used to ensure only one thread traverses the contextual graph -
   // either to read it or to write it. On server side, the same entrypoint will
   // be entered by numerous threads, but over time, the profile aggregated by
@@ -109,12 +108,7 @@ struct ContextRoot {
   // or with more concurrent collections (==more memory) and less collection
   // time. Note that concurrent collection does happen for different
   // entrypoints, regardless.
-  ::__sanitizer::StaticSpinMutex Taken;
-
-  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
-  // instrumentation lowering side because it is responsible for allocating and
-  // zero-initializing ContextRoots.
-  static_assert(sizeof(Taken) == 1);
+  ::__sanitizer::SpinMutex Taken;
 };
 
 // This is allocated and zero-initialized by the compiler, the in-place
@@ -139,8 +133,16 @@ struct FunctionData {
   FunctionData() { Mutex.Init(); }
 
   FunctionData *Next = nullptr;
+  ContextRoot *volatile CtxRoot = nullptr;
   ContextNode *volatile FlatCtx = nullptr;
+
+  ContextRoot *getOrAllocateContextRoot();
+
   ::__sanitizer::StaticSpinMutex Mutex;
+  // If (unlikely) StaticSpinMutex internals change, we need to modify the LLVM
+  // instrumentation lowering side because it is responsible for allocating and
+  // zero-initializing ContextRoots.
+  static_assert(sizeof(Mutex) == 1);
 };
 
 /// This API is exposed for testing. See the APIs below about the contract with
@@ -172,17 +174,17 @@ extern __thread __ctx_profile::ContextRoot
 
 /// called by LLVM in the entry BB of a "entry point" function. The returned
 /// pointer may be "tainted" - its LSB set to 1 - to indicate it's scratch.
-ContextNode *__llvm_ctx_profile_start_context(__ctx_profile::ContextRoot *Root,
-  GUID Guid, uint32_t Counters,
-  uint32_t Callsites);
+ContextNode *
+__llvm_ctx_profile_start_context(__ctx_profile::FunctionData *FData, GUID Guid,
+ uint32_t Counters, uint32_t Callsites);
 
 /// paired with _

[llvm-branch-commits] [llvm] [NFC][Cloning] Clean up comments in CloneFunctionInto (PR #129153)

2025-03-15 Thread Artem Pianykh via llvm-branch-commits

https://github.com/artempyanykh updated 
https://github.com/llvm/llvm-project/pull/129153

>From 1335a81fb84b008b42fb35d2c9fa4d1ed38a3081 Mon Sep 17 00:00:00 2001
From: Artem Pianykh 
Date: Tue, 25 Feb 2025 13:07:40 -0800
Subject: [PATCH] [NFC][Cloning] Clean up comments in CloneFunctionInto

Summary:
Some comments no longer make sense nor refer to an existing code path.

Test Plan:
ninja check-llvm-unit

stack-info: PR: https://github.com/llvm/llvm-project/pull/129153, branch: 
users/artempyanykh/fast-coro-upstream-part2-take2/11
---
 llvm/lib/Transforms/Utils/CloneFunction.cpp | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp 
b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index 3ccb53236c026..cde1ce8b43dbd 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -266,24 +266,13 @@ void llvm::CloneFunctionInto(Function *NewFunc, const 
Function *OldFunc,
   if (OldFunc->isDeclaration())
 return;
 
-  // When we remap instructions within the same module, we want to avoid
-  // duplicating inlined DISubprograms, so record all subprograms we find as we
-  // duplicate instructions and then freeze them in the MD map. We also record
-  // information about dbg.value and dbg.declare to avoid duplicating the
-  // types.
   DebugInfoFinder DIFinder;
 
-  // Track the subprogram attachment that needs to be cloned to fine-tune the
-  // mapping within the same module.
   if (Changes < CloneFunctionChangeType::DifferentModule) {
-// Need to find subprograms, types, and compile units.
-
 assert((NewFunc->getParent() == nullptr ||
 NewFunc->getParent() == OldFunc->getParent()) &&
"Expected NewFunc to have the same parent, or no parent");
   } else {
-// Need to find all the compile units.
-
 assert((NewFunc->getParent() == nullptr ||
 NewFunc->getParent() != OldFunc->getParent()) &&
"Expected NewFunc to have different parents, or no parent");

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang][CodeGen] Promote in complex compound divassign (PR #131453)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

https://github.com/Maetveis created 
https://github.com/llvm/llvm-project/pull/131453

When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should
promote to a wider type the same way division (without assignment) does.
Prior to this change, Smith's algorithm would be used for divassign.

Fixes: https://github.com/llvm/llvm-project/issues/131129

From de638997789200d0ec86ca5f4c68c8e57daa0bb3 Mon Sep 17 00:00:00 2001
From: Gergely Meszaros 
Date: Sat, 15 Mar 2025 12:53:32 +0100
Subject: [PATCH] [Clang][CodeGen] Promote in complex compound divassign

When `-fcomplex-arithmetic=promoted` is set complex divassign `/=` should
promote to a wider type the same way division (without assignment) does.
Prior to this change, Smith's algorithm would be used for divassign.

Fixes: https://github.com/llvm/llvm-project/issues/131129
---
 clang/lib/CodeGen/CGExprComplex.cpp   |  18 +-
 clang/test/CodeGen/cx-complex-range.c | 534 ++
 2 files changed, 225 insertions(+), 327 deletions(-)

diff --git a/clang/lib/CodeGen/CGExprComplex.cpp 
b/clang/lib/CodeGen/CGExprComplex.cpp
index a8a65a2f956f8..dc1a34ee82805 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -1212,19 +1212,24 @@ EmitCompoundAssignLValue(const CompoundAssignOperator 
*E,
   OpInfo.FPFeatures = E->getFPFeaturesInEffect(CGF.getLangOpts());
   CodeGenFunction::CGFPOptionsRAII FPOptsRAII(CGF, OpInfo.FPFeatures);
 
+  const bool IsComplexDivisor = E->getOpcode() == BO_DivAssign &&
+E->getRHS()->getType()->isAnyComplexType();
+
   // Load the RHS and LHS operands.
   // __block variables need to have the rhs evaluated first, plus this should
   // improve codegen a little.
   QualType PromotionTypeCR;
-  PromotionTypeCR = getPromotionType(E->getStoredFPFeaturesOrDefault(),
- E->getComputationResultType());
+  PromotionTypeCR =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getComputationResultType(), IsComplexDivisor);
   if (PromotionTypeCR.isNull())
 PromotionTypeCR = E->getComputationResultType();
   OpInfo.Ty = PromotionTypeCR;
   QualType ComplexElementTy =
   OpInfo.Ty->castAs()->getElementType();
-  QualType PromotionTypeRHS = getPromotionType(
-  E->getStoredFPFeaturesOrDefault(), E->getRHS()->getType());
+  QualType PromotionTypeRHS =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getRHS()->getType(), IsComplexDivisor);
 
   // The RHS should have been converted to the computation type.
   if (E->getRHS()->getType()->isRealFloatingType()) {
@@ -1252,8 +1257,9 @@ EmitCompoundAssignLValue(const CompoundAssignOperator *E,
 
   // Load from the l-value and convert it.
   SourceLocation Loc = E->getExprLoc();
-  QualType PromotionTypeLHS = getPromotionType(
-  E->getStoredFPFeaturesOrDefault(), E->getComputationLHSType());
+  QualType PromotionTypeLHS =
+  getPromotionType(E->getStoredFPFeaturesOrDefault(),
+   E->getComputationLHSType(), IsComplexDivisor);
   if (LHSTy->isAnyComplexType()) {
 ComplexPairTy LHSVal = EmitLoadOfLValue(LHS, Loc);
 if (!PromotionTypeLHS.isNull())
diff --git a/clang/test/CodeGen/cx-complex-range.c 
b/clang/test/CodeGen/cx-complex-range.c
index 06a349fbc2a47..a724e1ca8cb6d 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -721,44 +721,32 @@ _Complex float divf(_Complex float a, _Complex float b) {
 // PRMTD-NEXT:[[B_REAL:%.*]] = load float, ptr [[B_REALP]], align 4
 // PRMTD-NEXT:[[B_IMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[B]], i32 0, i32 1
 // PRMTD-NEXT:[[B_IMAG:%.*]] = load float, ptr [[B_IMAGP]], align 4
+// PRMTD-NEXT:[[EXT:%.*]] = fpext float [[B_REAL]] to double
+// PRMTD-NEXT:[[EXT1:%.*]] = fpext float [[B_IMAG]] to double
 // PRMTD-NEXT:[[TMP0:%.*]] = load ptr, ptr [[A_ADDR]], align 8
 // PRMTD-NEXT:[[DOTREALP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[TMP0]], i32 0, i32 0
 // PRMTD-NEXT:[[DOTREAL:%.*]] = load float, ptr [[DOTREALP]], align 4
 // PRMTD-NEXT:[[DOTIMAGP:%.*]] = getelementptr inbounds nuw { float, float 
}, ptr [[TMP0]], i32 0, i32 1
 // PRMTD-NEXT:[[DOTIMAG:%.*]] = load float, ptr [[DOTIMAGP]], align 4
-// PRMTD-NEXT:[[TMP1:%.*]] = call float @llvm.fabs.f32(float [[B_REAL]])
-// PRMTD-NEXT:[[TMP2:%.*]] = call float @llvm.fabs.f32(float [[B_IMAG]])
-// PRMTD-NEXT:[[ABS_CMP:%.*]] = fcmp ugt float [[TMP1]], [[TMP2]]
-// PRMTD-NEXT:br i1 [[ABS_CMP]], label 
[[ABS_RHSR_GREATER_OR_EQUAL_ABS_RHSI:%.*]], label 
[[ABS_RHSR_LESS_THAN_ABS_RHSI:%.*]]
-// PRMTD:   abs_rhsr_greater_or_equal_abs_rhsi:
-// PRMTD-NEXT:[[TMP3:%.*]] = fdiv float [[B_IMAG]], [[B_REAL]]
-// PRMTD-NEXT:[[TMP4:%.*]] = fmul float [[TMP3]], [[B_IMAG]]
-// PRMTD-NEXT:[[TMP5:%.*]] = fadd float [[B_REAL]], [[TMP4]]
-// PRM

[llvm-branch-commits] [clang] [Clang][CodeGen] Promote in complex compound divassign (PR #131453)

2025-03-15 Thread Mészáros Gergely via llvm-branch-commits

Maetveis wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/131453?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#131453** https://app.graphite.dev/github/pr/llvm/llvm-project/131453?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈 https://app.graphite.dev/github/pr/llvm/llvm-project/131453?utm_source=stack-comment-view-in-graphite";
 target="_blank">(View in Graphite)
* **#131451** https://app.graphite.dev/github/pr/llvm/llvm-project/131451?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#131447** https://app.graphite.dev/github/pr/llvm/llvm-project/131447?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`




This stack of pull requests is managed by https://graphite.dev?utm-source=stack-comment";>Graphite. Learn 
more about https://stacking.dev/?utm_source=stack-comment";>stacking.


https://github.com/llvm/llvm-project/pull/131453
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits