[llvm-branch-commits] [llvm] release/19.x: [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC (#100086) (PR #102103)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102103
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC (#100086) (PR #102103)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102103

Backport 3a226dbe27ac7c7d935bc0968e84e31798a01207

Requested by: @rorth

>From 7d36f22d285d3142213ec4c8312b2a8f0f7ea83f Mon Sep 17 00:00:00 2001
From: Rainer Orth 
Date: Tue, 6 Aug 2024 09:08:41 +0200
Subject: [PATCH] [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC
 (#100086)

As discussed in Issue #86793, the `MachOTest.UnalignedLC` test dies with
`SIGBUS` on SPARC, a strict-alignment target. It simply cannot work
there. Besides, the test invokes undefined behaviour on big-endian
targets, so this patch disables it on all of those.

Tested on `sparcv9-sun-solaris2.11` and `amd64-pc-solaris2.11`.

(cherry picked from commit 3a226dbe27ac7c7d935bc0968e84e31798a01207)
---
 llvm/unittests/BinaryFormat/MachOTest.cpp | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/llvm/unittests/BinaryFormat/MachOTest.cpp 
b/llvm/unittests/BinaryFormat/MachOTest.cpp
index 391298ff38d76..78b20c28a9549 100644
--- a/llvm/unittests/BinaryFormat/MachOTest.cpp
+++ b/llvm/unittests/BinaryFormat/MachOTest.cpp
@@ -6,6 +6,7 @@
 //
 
//===--===//
 
+#include "llvm/ADT/bit.h"
 #include "llvm/BinaryFormat/MachO.h"
 #include "llvm/TargetParser/Triple.h"
 #include "gtest/gtest.h"
@@ -13,7 +14,15 @@
 using namespace llvm;
 using namespace llvm::MachO;
 
-TEST(MachOTest, UnalignedLC) {
+#if BYTE_ORDER == BIG_ENDIAN
+// As discussed in Issue #86793, this test cannot work on a strict-alignment
+// targets like SPARC.  Besides, it's undefined behaviour on big-endian hosts.
+#define MAYBE_UnalignedLC DISABLED_UnalignedLC
+#else
+#define MAYBE_UnalignedLC UnalignedLC
+#endif
+
+TEST(MachOTest, MAYBE_UnalignedLC) {
   unsigned char Valid32BitMachO[] = {
   0xCE, 0xFA, 0xED, 0xFE, 0x07, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00,
   0x02, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00,

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [BinaryFormat] Disable MachOTest.UnalignedLC on SPARC (#100086) (PR #102103)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:

@efriedma-quic What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102103
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102112

Backport bb59f04e7e75dcbe39f1bf952304a157f0035314

Requested by: @thesamesam

>From 34e2fc058a83c397f04f507ede4509f4e433df25 Mon Sep 17 00:00:00 2001
From: Sam James 
Date: Tue, 6 Aug 2024 09:58:36 +0100
Subject: [PATCH] [LLDB] Add `` to AddressableBits (#102110)

(cherry picked from commit bb59f04e7e75dcbe39f1bf952304a157f0035314)
---
 lldb/include/lldb/Utility/AddressableBits.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lldb/include/lldb/Utility/AddressableBits.h 
b/lldb/include/lldb/Utility/AddressableBits.h
index 0d27c3561ec27..8c7a1ec5f52c0 100644
--- a/lldb/include/lldb/Utility/AddressableBits.h
+++ b/lldb/include/lldb/Utility/AddressableBits.h
@@ -12,6 +12,8 @@
 #include "lldb/lldb-forward.h"
 #include "lldb/lldb-public.h"
 
+#include 
+
 namespace lldb_private {
 
 /// \class AddressableBits AddressableBits.h "lldb/Core/AddressableBits.h"

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102112
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:

@labath What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102112
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-lldb

Author: None (llvmbot)


Changes

Backport bb59f04e7e75dcbe39f1bf952304a157f0035314

Requested by: @thesamesam

---
Full diff: https://github.com/llvm/llvm-project/pull/102112.diff


1 Files Affected:

- (modified) lldb/include/lldb/Utility/AddressableBits.h (+2) 


``diff
diff --git a/lldb/include/lldb/Utility/AddressableBits.h 
b/lldb/include/lldb/Utility/AddressableBits.h
index 0d27c3561ec27..8c7a1ec5f52c0 100644
--- a/lldb/include/lldb/Utility/AddressableBits.h
+++ b/lldb/include/lldb/Utility/AddressableBits.h
@@ -12,6 +12,8 @@
 #include "lldb/lldb-forward.h"
 #include "lldb/lldb-public.h"
 
+#include 
+
 namespace lldb_private {
 
 /// \class AddressableBits AddressableBits.h "lldb/Core/AddressableBits.h"

``




https://github.com/llvm/llvm-project/pull/102112
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread Pavel Labath via llvm-branch-commits

https://github.com/labath approved this pull request.


https://github.com/llvm/llvm-project/pull/102112
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Driver] Temporarily probe aarch64-linux-gnu GCC installation (PR #102039)

2024-08-06 Thread Aaron Ballman via llvm-branch-commits

https://github.com/AaronBallman approved this pull request.

LGTM, thank you for the quick workaround!

https://github.com/llvm/llvm-project/pull/102039
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (PR #102130)

2024-08-06 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic created 
https://github.com/llvm/llvm-project/pull/102130

isExtractHiElt should return new source register instead of returning
instruction that defines it. Src = MI.getOperand(0).getReg() is not
correct when MI(for example G_UNMERGE_VALUES) defines multiple registers.
Refactor existing code to work with source registers only.

>From b6c03d1785bd713344c6b578869a0b36fc6473e3 Mon Sep 17 00:00:00 2001
From: Petar Avramovic 
Date: Tue, 6 Aug 2024 13:50:35 +0200
Subject: [PATCH] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix

isExtractHiElt should return new source register instead of returning
instruction that defines it. Src = MI.getOperand(0).getReg() is not
correct when MI(for example G_UNMERGE_VALUES) defines multiple registers.
Refactor existing code to work with source registers only.
---
 .../AMDGPU/AMDGPUInstructionSelector.cpp  | 164 --
 .../Target/AMDGPU/AMDGPUInstructionSelector.h |   2 +-
 .../GlobalISel/combine-fma-add-ext-fma.ll |   8 +-
 3 files changed, 74 insertions(+), 100 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 73f3921b2ff4c..f78699f88de56 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -1372,8 +1372,8 @@ bool 
AMDGPUInstructionSelector::selectIntrinsicCmp(MachineInstr &I) const {
   MachineInstrBuilder SelectedMI;
   MachineOperand &LHS = I.getOperand(2);
   MachineOperand &RHS = I.getOperand(3);
-  auto [Src0, Src0Mods] = selectVOP3ModsImpl(LHS);
-  auto [Src1, Src1Mods] = selectVOP3ModsImpl(RHS);
+  auto [Src0, Src0Mods] = selectVOP3ModsImpl(LHS.getReg());
+  auto [Src1, Src1Mods] = selectVOP3ModsImpl(RHS.getReg());
   Register Src0Reg =
   copyToVGPRIfSrcFolded(Src0, Src0Mods, LHS, &I, /*ForceVGPR*/ true);
   Register Src1Reg =
@@ -2467,14 +2467,48 @@ bool 
AMDGPUInstructionSelector::selectG_SZA_EXT(MachineInstr &I) const {
   return false;
 }
 
+static Register stripCopy(Register Reg, MachineRegisterInfo &MRI) {
+  return getDefSrcRegIgnoringCopies(Reg, MRI)->Reg;
+}
+
+static Register stripBitCast(Register Reg, MachineRegisterInfo &MRI) {
+  Register BitcastSrc;
+  if (mi_match(Reg, MRI, m_GBitcast(m_Reg(BitcastSrc
+Reg = BitcastSrc;
+  return Reg;
+}
+
 static bool isExtractHiElt(MachineRegisterInfo &MRI, Register In,
Register &Out) {
+  Register Trunc;
+  if (!mi_match(In, MRI, m_GTrunc(m_Reg(Trunc
+return false;
+
   Register LShlSrc;
-  if (mi_match(In, MRI,
-   m_GTrunc(m_GLShr(m_Reg(LShlSrc), m_SpecificICst(16) {
-Out = LShlSrc;
+  Register Cst;
+  if (mi_match(Trunc, MRI, m_GLShr(m_Reg(LShlSrc), m_Reg(Cst {
+Cst = stripCopy(Cst, MRI);
+if (mi_match(Cst, MRI, m_SpecificICst(16))) {
+  Out = stripBitCast(LShlSrc, MRI);
+  return true;
+}
+  }
+
+  MachineInstr *Shuffle = MRI.getVRegDef(Trunc);
+  if (Shuffle->getOpcode() != AMDGPU::G_SHUFFLE_VECTOR)
+return false;
+
+  assert(MRI.getType(Shuffle->getOperand(0).getReg()) ==
+ LLT::fixed_vector(2, 16));
+
+  ArrayRef Mask = Shuffle->getOperand(3).getShuffleMask();
+  assert(Mask.size() == 2);
+
+  if (Mask[0] == 1 && Mask[1] <= 1) {
+Out = Shuffle->getOperand(0).getReg();
 return true;
   }
+
   return false;
 }
 
@@ -3550,11 +3584,8 @@ AMDGPUInstructionSelector::selectVCSRC(MachineOperand 
&Root) const {
 
 }
 
-std::pair
-AMDGPUInstructionSelector::selectVOP3ModsImpl(MachineOperand &Root,
-  bool IsCanonicalizing,
-  bool AllowAbs, bool OpSel) const 
{
-  Register Src = Root.getReg();
+std::pair AMDGPUInstructionSelector::selectVOP3ModsImpl(
+Register Src, bool IsCanonicalizing, bool AllowAbs, bool OpSel) const {
   unsigned Mods = 0;
   MachineInstr *MI = getDefIgnoringCopies(Src, *MRI);
 
@@ -3617,7 +3648,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3Mods0(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root);
+  std::tie(Src, Mods) = selectVOP3ModsImpl(Root.getReg());
 
   return {{
   [=](MachineInstrBuilder &MIB) {
@@ -3633,7 +3664,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3BMods0(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root,
+  std::tie(Src, Mods) = selectVOP3ModsImpl(Root.getReg(),
/*IsCanonicalizing=*/true,
/*AllowAbs=*/false);
 
@@ -3660,7 +3691,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3Mods(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root);
+  std::tie(Src, Mods) = select

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (PR #102130)

2024-08-06 Thread Petar Avramovic via llvm-branch-commits

petar-avramovic wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/102130?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#102130** https://app.graphite.dev/github/pr/llvm/llvm-project/102130?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#102129** https://app.graphite.dev/github/pr/llvm/llvm-project/102129?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @petar-avramovic and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/102130
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (PR #102130)

2024-08-06 Thread Petar Avramovic via llvm-branch-commits

https://github.com/petar-avramovic ready_for_review 
https://github.com/llvm/llvm-project/pull/102130
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (PR #102130)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Petar Avramovic (petar-avramovic)


Changes

isExtractHiElt should return new source register instead of returning
instruction that defines it. Src = MI.getOperand(0).getReg() is not
correct when MI(for example G_UNMERGE_VALUES) defines multiple registers.
Refactor existing code to work with source registers only.

---
Full diff: https://github.com/llvm/llvm-project/pull/102130.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp (+69-95) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.h (+1-1) 
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-ext-fma.ll 
(+4-4) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
index 73f3921b2ff4c..f78699f88de56 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
@@ -1372,8 +1372,8 @@ bool 
AMDGPUInstructionSelector::selectIntrinsicCmp(MachineInstr &I) const {
   MachineInstrBuilder SelectedMI;
   MachineOperand &LHS = I.getOperand(2);
   MachineOperand &RHS = I.getOperand(3);
-  auto [Src0, Src0Mods] = selectVOP3ModsImpl(LHS);
-  auto [Src1, Src1Mods] = selectVOP3ModsImpl(RHS);
+  auto [Src0, Src0Mods] = selectVOP3ModsImpl(LHS.getReg());
+  auto [Src1, Src1Mods] = selectVOP3ModsImpl(RHS.getReg());
   Register Src0Reg =
   copyToVGPRIfSrcFolded(Src0, Src0Mods, LHS, &I, /*ForceVGPR*/ true);
   Register Src1Reg =
@@ -2467,14 +2467,48 @@ bool 
AMDGPUInstructionSelector::selectG_SZA_EXT(MachineInstr &I) const {
   return false;
 }
 
+static Register stripCopy(Register Reg, MachineRegisterInfo &MRI) {
+  return getDefSrcRegIgnoringCopies(Reg, MRI)->Reg;
+}
+
+static Register stripBitCast(Register Reg, MachineRegisterInfo &MRI) {
+  Register BitcastSrc;
+  if (mi_match(Reg, MRI, m_GBitcast(m_Reg(BitcastSrc
+Reg = BitcastSrc;
+  return Reg;
+}
+
 static bool isExtractHiElt(MachineRegisterInfo &MRI, Register In,
Register &Out) {
+  Register Trunc;
+  if (!mi_match(In, MRI, m_GTrunc(m_Reg(Trunc
+return false;
+
   Register LShlSrc;
-  if (mi_match(In, MRI,
-   m_GTrunc(m_GLShr(m_Reg(LShlSrc), m_SpecificICst(16) {
-Out = LShlSrc;
+  Register Cst;
+  if (mi_match(Trunc, MRI, m_GLShr(m_Reg(LShlSrc), m_Reg(Cst {
+Cst = stripCopy(Cst, MRI);
+if (mi_match(Cst, MRI, m_SpecificICst(16))) {
+  Out = stripBitCast(LShlSrc, MRI);
+  return true;
+}
+  }
+
+  MachineInstr *Shuffle = MRI.getVRegDef(Trunc);
+  if (Shuffle->getOpcode() != AMDGPU::G_SHUFFLE_VECTOR)
+return false;
+
+  assert(MRI.getType(Shuffle->getOperand(0).getReg()) ==
+ LLT::fixed_vector(2, 16));
+
+  ArrayRef Mask = Shuffle->getOperand(3).getShuffleMask();
+  assert(Mask.size() == 2);
+
+  if (Mask[0] == 1 && Mask[1] <= 1) {
+Out = Shuffle->getOperand(0).getReg();
 return true;
   }
+
   return false;
 }
 
@@ -3550,11 +3584,8 @@ AMDGPUInstructionSelector::selectVCSRC(MachineOperand 
&Root) const {
 
 }
 
-std::pair
-AMDGPUInstructionSelector::selectVOP3ModsImpl(MachineOperand &Root,
-  bool IsCanonicalizing,
-  bool AllowAbs, bool OpSel) const 
{
-  Register Src = Root.getReg();
+std::pair AMDGPUInstructionSelector::selectVOP3ModsImpl(
+Register Src, bool IsCanonicalizing, bool AllowAbs, bool OpSel) const {
   unsigned Mods = 0;
   MachineInstr *MI = getDefIgnoringCopies(Src, *MRI);
 
@@ -3617,7 +3648,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3Mods0(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root);
+  std::tie(Src, Mods) = selectVOP3ModsImpl(Root.getReg());
 
   return {{
   [=](MachineInstrBuilder &MIB) {
@@ -3633,7 +3664,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3BMods0(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root,
+  std::tie(Src, Mods) = selectVOP3ModsImpl(Root.getReg(),
/*IsCanonicalizing=*/true,
/*AllowAbs=*/false);
 
@@ -3660,7 +3691,7 @@ InstructionSelector::ComplexRendererFns
 AMDGPUInstructionSelector::selectVOP3Mods(MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root);
+  std::tie(Src, Mods) = selectVOP3ModsImpl(Root.getReg());
 
   return {{
   [=](MachineInstrBuilder &MIB) {
@@ -3675,7 +3706,8 @@ 
AMDGPUInstructionSelector::selectVOP3ModsNonCanonicalizing(
 MachineOperand &Root) const {
   Register Src;
   unsigned Mods;
-  std::tie(Src, Mods) = selectVOP3ModsImpl(Root, /*IsCanonicalizing=*/false);
+  std::tie(Src, Mods) =
+  selectVOP3ModsImpl(Ro

[llvm-branch-commits] [clang] [llvm] [WIP][Offload] Add runtime support for multi-dim `num_teams` (PR #101723)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

shiltian wrote:

This will be closed for now. It will be easier to make runtime changes for 
thread block size and grid size in one PR.

https://github.com/llvm/llvm-project/pull/101723
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [llvm] [WIP][Offload] Add runtime support for multi-dim `num_teams` (PR #101723)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian closed 
https://github.com/llvm/llvm-project/pull/101723
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Don't remove from unfinalized GISelWorkList (PR #102158)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler created 
https://github.com/llvm/llvm-project/pull/102158

Remove a hack from GISelWorkList caused by the Combiner removing
instructions from an unfinalized GISelWorkList during the DCE phase.
This is in preparation for larger changes to the WorkListMaintainer.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Don't remove from unfinalized GISelWorkList (PR #102158)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Tobias Stadler (tobias-stadler)


Changes

Remove a hack from GISelWorkList caused by the Combiner removing
instructions from an unfinalized GISelWorkList during the DCE phase.
This is in preparation for larger changes to the WorkListMaintainer.


---
Full diff: https://github.com/llvm/llvm-project/pull/102158.diff


3 Files Affected:

- (modified) llvm/include/llvm/CodeGen/GlobalISel/GISelChangeObserver.h (+3) 
- (modified) llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h (+1-1) 
- (modified) llvm/lib/CodeGen/GlobalISel/Combiner.cpp (+6-5) 


``diff
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GISelChangeObserver.h 
b/llvm/include/llvm/CodeGen/GlobalISel/GISelChangeObserver.h
index cad2216db34fe..7ec5dac9a6eba 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GISelChangeObserver.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GISelChangeObserver.h
@@ -79,6 +79,9 @@ class GISelObserverWrapper : public MachineFunction::Delegate,
 if (It != Observers.end())
   Observers.erase(It);
   }
+  // Removes all observers
+  void clearObservers() { Observers.clear(); }
+
   // API for Observer.
   void erasingInstr(MachineInstr &MI) override {
 for (auto &O : Observers)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h 
b/llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h
index 3ec6a1da201e2..dba3a8a14480c 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/GISelWorkList.h
@@ -82,7 +82,7 @@ class GISelWorkList {
   /// Remove I from the worklist if it exists.
   void remove(const MachineInstr *I) {
 #if LLVM_ENABLE_ABI_BREAKING_CHECKS
-assert((Finalized || WorklistMap.empty()) && "Neither finalized nor 
empty");
+assert(Finalized && "GISelWorkList used without finalizing");
 #endif
 auto It = WorklistMap.find(I);
 if (It == WorklistMap.end())
diff --git a/llvm/lib/CodeGen/GlobalISel/Combiner.cpp 
b/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
index 5da9e86b20761..49842e5fd65da 100644
--- a/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
@@ -110,11 +110,6 @@ Combiner::Combiner(MachineFunction &MF, CombinerInfo 
&CInfo,
   if (CSEInfo)
 B.setCSEInfo(CSEInfo);
 
-  // Setup observer.
-  ObserverWrapper->addObserver(WLObserver.get());
-  if (CSEInfo)
-ObserverWrapper->addObserver(CSEInfo);
-
   B.setChangeObserver(*ObserverWrapper);
 }
 
@@ -147,6 +142,9 @@ bool Combiner::combineMachineInstrs() {
 LLVM_DEBUG(dbgs() << "\n\nCombiner iteration #" << Iteration << '\n');
 
 WorkList.clear();
+ObserverWrapper->clearObservers();
+if (CSEInfo)
+  ObserverWrapper->addObserver(CSEInfo);
 
 // Collect all instructions. Do a post order traversal for basic blocks and
 // insert with list bottom up, so while we pop_back_val, we'll traverse top
@@ -168,6 +166,9 @@ bool Combiner::combineMachineInstrs() {
   }
 }
 WorkList.finalize();
+
+// Only notify WLObserver during actual combines
+ObserverWrapper->addObserver(WLObserver.get());
 // Main Loop. Process the instructions here.
 while (!WorkList.empty()) {
   MachineInstr *CurrInst = WorkList.pop_back_val();

``




https://github.com/llvm/llvm-project/pull/102158
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) (PR #102159)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102159

Backport bd576fe34285c4dcd04837bf07a89a9c00e3cd

Requested by: @ChuanqiXu9

>From 97774f6fb6541314a9ee5b8257cfc9e8c3221d4e Mon Sep 17 00:00:00 2001
From: Sharadh Rajaraman 
Date: Tue, 6 Aug 2024 16:05:55 +0100
Subject: [PATCH] [clang][driver][clang-cl] Support `--precompile` and
 `-fmodule-*` options in Clang-CL (#98761)

This PR is the first step in improving the situation for `clang-cl`
detailed in [this LLVM Discourse
thread](https://discourse.llvm.org/t/clang-cl-exe-support-for-c-modules/72257/28).
There has been some work done in #89772. I believe this is somewhat
orthogonal.

This is a work-in-progress; the functionality has only been tested with
the [basic 'Hello World'
example](https://clang.llvm.org/docs/StandardCPlusPlusModules.html#quick-start),
and proper test cases need to be written. I'd like some thoughts on
this, thanks!

Partially resolves #64118.

(cherry picked from commit bd576fe34285c4dcd04837bf07a89a9c00e3cd5e)
---
 clang/docs/StandardCPlusPlusModules.rst | 17 ++---
 clang/docs/UsersManual.rst  |  6 ++
 clang/include/clang/Driver/Options.td   |  9 ++---
 clang/test/Driver/cl-cxx20-modules.cppm |  8 
 4 files changed, 30 insertions(+), 10 deletions(-)
 create mode 100644 clang/test/Driver/cl-cxx20-modules.cppm

diff --git a/clang/docs/StandardCPlusPlusModules.rst 
b/clang/docs/StandardCPlusPlusModules.rst
index b87491910e222..2478a77e7640c 100644
--- a/clang/docs/StandardCPlusPlusModules.rst
+++ b/clang/docs/StandardCPlusPlusModules.rst
@@ -398,6 +398,16 @@ BMIs cannot be shipped in an archive to create a module 
library. Instead, the
 BMIs(``*.pcm``) are compiled into object files(``*.o``) and those object files
 are added to the archive instead.
 
+clang-cl
+
+
+``clang-cl`` supports the same options as ``clang++`` for modules as detailed 
above;
+there is no need to prefix these options with ``/clang:``. Note that ``cl.exe``
+`options to emit/consume IFC files 
`
 are *not* supported.
+The resultant precompiled modules are also not compatible for use with 
``cl.exe``.
+
+We recommend that build system authors use the above-mentioned ``clang++`` 
options  with ``clang-cl`` to build modules.
+
 Consistency Requirements
 
 
@@ -1387,13 +1397,6 @@ have ``.cppm`` (or ``.ccm``, ``.cxxm``, ``.c++m``) as 
the file extension.
 However, the behavior is inconsistent with other compilers. This is tracked by
 `#57416 `_.
 
-clang-cl is not compatible with standard C++ modules
-
-
-``/clang:-fmodule-file`` and ``/clang:-fprebuilt-module-path`` cannot be used
-to specify the BMI with ``clang-cl.exe``. This is tracked by
-`#64118 `_.
-
 Incorrect ODR violation diagnostics
 ~~~
 
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index e9b95739ea2ab..64e991451bf70 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -4745,6 +4745,12 @@ Execute ``clang-cl /?`` to see a list of supported 
options:
   -flto=   Set LTO mode to either 'full' or 'thin'
   -flto   Enable LTO in 'full' mode
   -fmerge-all-constants   Allow merging of constants
+  -fmodule-file==
+  Use the specified module file that provides the 
module 
+  -fmodule-header=
+  Build  as a C++20 header unit
+  -fmodule-output=
+  Save intermediate module file results when 
compiling a standard C++ module unit.
   -fms-compatibility-version=
   Dot-separated value representing the Microsoft 
compiler version
   number to report in _MSC_VER (0 = don't define 
it; default is same value as installed cl.exe, or 1933)
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 359a698ea87dd..188c933752f19 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3106,7 +3106,7 @@ def fmodules_user_build_path : Separate<["-"], 
"fmodules-user-build-path">, Grou
   HelpText<"Specify the module user build path">,
   MarshallingInfoString>;
 def fprebuilt_module_path : Joined<["-"], "fprebuilt-module-path=">, 
Group,
-  Flags<[]>, Visibility<[ClangOption, CC1Option]>,
+  Flags<[]>, Visibility<[ClangOption, CLOption, CC1Option]>,
   MetaVarName<"">,
   HelpText<"Specify the prebuilt module path">;
 defm prebuilt_implicit_modules : BoolFOption<"prebuilt-implicit-modules",
@@ -3115,11 +3115,11 @@ defm prebuilt_implicit_modules : 
BoolFOption<"prebuilt-implicit-modules",
   NegFlag, BothFlags<[], [ClangOption, CC1Opt

[llvm-branch-commits] [clang] release/19.x: [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) (PR #102159)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102159
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) (PR #102159)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:

@ChuanqiXu9 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102159
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) (PR #102159)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: None (llvmbot)


Changes

Backport bd576fe34285c4dcd04837bf07a89a9c00e3cd

Requested by: @ChuanqiXu9

---
Full diff: https://github.com/llvm/llvm-project/pull/102159.diff


4 Files Affected:

- (modified) clang/docs/StandardCPlusPlusModules.rst (+10-7) 
- (modified) clang/docs/UsersManual.rst (+6) 
- (modified) clang/include/clang/Driver/Options.td (+6-3) 
- (added) clang/test/Driver/cl-cxx20-modules.cppm (+8) 


``diff
diff --git a/clang/docs/StandardCPlusPlusModules.rst 
b/clang/docs/StandardCPlusPlusModules.rst
index b87491910e222..2478a77e7640c 100644
--- a/clang/docs/StandardCPlusPlusModules.rst
+++ b/clang/docs/StandardCPlusPlusModules.rst
@@ -398,6 +398,16 @@ BMIs cannot be shipped in an archive to create a module 
library. Instead, the
 BMIs(``*.pcm``) are compiled into object files(``*.o``) and those object files
 are added to the archive instead.
 
+clang-cl
+
+
+``clang-cl`` supports the same options as ``clang++`` for modules as detailed 
above;
+there is no need to prefix these options with ``/clang:``. Note that ``cl.exe``
+`options to emit/consume IFC files 
`
 are *not* supported.
+The resultant precompiled modules are also not compatible for use with 
``cl.exe``.
+
+We recommend that build system authors use the above-mentioned ``clang++`` 
options  with ``clang-cl`` to build modules.
+
 Consistency Requirements
 
 
@@ -1387,13 +1397,6 @@ have ``.cppm`` (or ``.ccm``, ``.cxxm``, ``.c++m``) as 
the file extension.
 However, the behavior is inconsistent with other compilers. This is tracked by
 `#57416 `_.
 
-clang-cl is not compatible with standard C++ modules
-
-
-``/clang:-fmodule-file`` and ``/clang:-fprebuilt-module-path`` cannot be used
-to specify the BMI with ``clang-cl.exe``. This is tracked by
-`#64118 `_.
-
 Incorrect ODR violation diagnostics
 ~~~
 
diff --git a/clang/docs/UsersManual.rst b/clang/docs/UsersManual.rst
index e9b95739ea2ab..64e991451bf70 100644
--- a/clang/docs/UsersManual.rst
+++ b/clang/docs/UsersManual.rst
@@ -4745,6 +4745,12 @@ Execute ``clang-cl /?`` to see a list of supported 
options:
   -flto=   Set LTO mode to either 'full' or 'thin'
   -flto   Enable LTO in 'full' mode
   -fmerge-all-constants   Allow merging of constants
+  -fmodule-file==
+  Use the specified module file that provides the 
module 
+  -fmodule-header=
+  Build  as a C++20 header unit
+  -fmodule-output=
+  Save intermediate module file results when 
compiling a standard C++ module unit.
   -fms-compatibility-version=
   Dot-separated value representing the Microsoft 
compiler version
   number to report in _MSC_VER (0 = don't define 
it; default is same value as installed cl.exe, or 1933)
diff --git a/clang/include/clang/Driver/Options.td 
b/clang/include/clang/Driver/Options.td
index 359a698ea87dd..188c933752f19 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -3106,7 +3106,7 @@ def fmodules_user_build_path : Separate<["-"], 
"fmodules-user-build-path">, Grou
   HelpText<"Specify the module user build path">,
   MarshallingInfoString>;
 def fprebuilt_module_path : Joined<["-"], "fprebuilt-module-path=">, 
Group,
-  Flags<[]>, Visibility<[ClangOption, CC1Option]>,
+  Flags<[]>, Visibility<[ClangOption, CLOption, CC1Option]>,
   MetaVarName<"">,
   HelpText<"Specify the prebuilt module path">;
 defm prebuilt_implicit_modules : BoolFOption<"prebuilt-implicit-modules",
@@ -3115,11 +3115,11 @@ defm prebuilt_implicit_modules : 
BoolFOption<"prebuilt-implicit-modules",
   NegFlag, BothFlags<[], [ClangOption, CC1Option]>>;
 
 def fmodule_output_EQ : Joined<["-"], "fmodule-output=">,
-  Flags<[NoXarchOption]>, Visibility<[ClangOption, CC1Option]>,
+  Flags<[NoXarchOption]>, Visibility<[ClangOption, CLOption, CC1Option]>,
   MarshallingInfoString>,
   HelpText<"Save intermediate module file results when compiling a standard 
C++ module unit.">;
 def fmodule_output : Flag<["-"], "fmodule-output">, Flags<[NoXarchOption]>,
-  Visibility<[ClangOption, CC1Option]>,
+  Visibility<[ClangOption, CLOption, CC1Option]>,
   HelpText<"Save intermediate module file results when compiling a standard 
C++ module unit.">;
 
 defm skip_odr_check_in_gmf : BoolOption<"f", "skip-odr-check-in-gmf",
@@ -3299,8 +3299,10 @@ def fretain_comments_from_system_headers : Flag<["-"], 
"fretain-comments-from-sy
   Visibility<[ClangOption, CC1Option]>,
   MarshallingInfoFlag>;
 def fmodule

[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler created 
https://github.com/llvm/llvm-project/pull/102163

Continues the work for disabling fixed-point iteration in the Combiner
(#94291).

This introduces improved Observer-based heuristics in the
GISel Combiner to retry combining defs/uses of modified instructions and
for performing sparse dead code elimination.

I have experimented a lot with the heuristics and this seems to be the
minimal set of heuristics that allows disabling fixed-point iteration
for AArch64 CTMark O2 without regressions.
Enabling this globally would pass all regression tests for all official
targets (apart from small benign diffs), but I have made this fully
opt-in for now, because I can't quantify the impact for other targets.

For performance numbers see my follow-up patch for AArch64.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Tobias Stadler (tobias-stadler)


Changes

Continues the work for disabling fixed-point iteration in the Combiner
(#94291).

This introduces improved Observer-based heuristics in the
GISel Combiner to retry combining defs/uses of modified instructions and
for performing sparse dead code elimination.

I have experimented a lot with the heuristics and this seems to be the
minimal set of heuristics that allows disabling fixed-point iteration
for AArch64 CTMark O2 without regressions.
Enabling this globally would pass all regression tests for all official
targets (apart from small benign diffs), but I have made this fully
opt-in for now, because I can't quantify the impact for other targets.

For performance numbers see my follow-up patch for AArch64.


---
Full diff: https://github.com/llvm/llvm-project/pull/102163.diff


3 Files Affected:

- (modified) llvm/include/llvm/CodeGen/GlobalISel/Combiner.h (+8-2) 
- (modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h (+25) 
- (modified) llvm/lib/CodeGen/GlobalISel/Combiner.cpp (+180-38) 


``diff
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h 
b/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
index f826601544932..fa6a7be6cf6c3 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/Combiner.h
@@ -15,13 +15,13 @@
 #ifndef LLVM_CODEGEN_GLOBALISEL_COMBINER_H
 #define LLVM_CODEGEN_GLOBALISEL_COMBINER_H
 
+#include "llvm/CodeGen/GlobalISel/CombinerInfo.h"
 #include "llvm/CodeGen/GlobalISel/GIMatchTableExecutor.h"
 #include "llvm/CodeGen/GlobalISel/GISelChangeObserver.h"
 #include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
 
 namespace llvm {
 class MachineRegisterInfo;
-struct CombinerInfo;
 class GISelCSEInfo;
 class TargetPassConfig;
 class MachineFunction;
@@ -33,8 +33,12 @@ class MachineIRBuilder;
 /// TODO: Is it worth making this module-wide?
 class Combiner : public GIMatchTableExecutor {
 private:
+  using WorkListTy = GISelWorkList<512>;
+
   class WorkListMaintainer;
-  GISelWorkList<512> WorkList;
+  template  class WorkListMaintainerImpl;
+
+  WorkListTy WorkList;
 
   // We have a little hack here where keep the owned pointers private, and only
   // expose a reference. This has two purposes:
@@ -48,6 +52,8 @@ class Combiner : public GIMatchTableExecutor {
 
   bool HasSetupMF = false;
 
+  static bool tryDCE(MachineInstr &MI, MachineRegisterInfo &MRI);
+
 public:
   /// If CSEInfo is not null, then the Combiner will use CSEInfo as the 
observer
   /// and also create a CSEMIRBuilder. Pass nullptr if CSE is not needed.
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h 
b/llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h
index 2b0eb71f88082..67f95c962c582 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerInfo.h
@@ -53,6 +53,31 @@ struct CombinerInfo {
   /// The maximum number of times the Combiner will iterate over the
   /// MachineFunction. Setting this to 0 enables fixed-point iteration.
   unsigned MaxIterations = 0;
+
+  enum class ObserverLevel {
+/// Only retry combining created/changed instructions.
+/// This replicates the legacy default Observer behavior for use with
+/// fixed-point iteration.
+Basic,
+/// Enables Observer-based detection of dead instructions. This can save
+/// some compile-time if full disabling of fixed-point iteration is not
+/// desired. If the input IR doesn't contain dead instructions, consider
+/// disabling \p EnableFullDCE.
+DCE,
+/// Enables Observer-based DCE and additional heuristics that retry
+/// combining defined and used instructions of modified instructions.
+/// This provides a good balance between compile-time and completeness of
+/// combining without needing fixed-point iteration.
+SinglePass,
+  };
+
+  /// Select how the Combiner acts on MIR changes.
+  ObserverLevel ObserverLvl = ObserverLevel::Basic;
+
+  /// Whether dead code elimination is performed before each Combiner 
iteration.
+  /// If Observer-based DCE is enabled, this controls if a full DCE pass is
+  /// performed before the first Combiner iteration.
+  bool EnableFullDCE = true;
 };
 } // namespace llvm
 
diff --git a/llvm/lib/CodeGen/GlobalISel/Combiner.cpp 
b/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
index 49842e5fd65da..c5ec73cd5c65d 100644
--- a/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Combiner.cpp
@@ -45,61 +45,189 @@ cl::OptionCategory GICombinerOptionCategory(
 );
 } // end namespace llvm
 
-/// This class acts as the glue the joins the CombinerHelper to the overall
+/// This class acts as the glue that joins the CombinerHelper to the overall
 /// Combine algorithm. The CombinerHelper is intended to report the
 /// modifications it makes to the MIR to the GISelChangeObserver and the
-/// observer subclass will act on t

[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler created 
https://github.com/llvm/llvm-project/pull/102167

Disable fixed-point iteration in all AArch64 Combiners after #102163.
See inline comments for justification of test changes.



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-globalisel

Author: Tobias Stadler (tobias-stadler)


Changes

Disable fixed-point iteration in all AArch64 Combiners after #102163.
See inline comments for justification of test changes.


---
Full diff: https://github.com/llvm/llvm-project/pull/102167.diff


10 Files Affected:

- (modified) llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp 
(+5) 
- (modified) llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp 
(+5) 
- (modified) llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp (+6) 
- (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir 
(+2-2) 
- (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-overflow.mir (+4-4) 
- (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll (+4-4) 
- (modified) llvm/test/CodeGen/AArch64/GlobalISel/fold-global-offsets.mir 
(+3-3) 
- (modified) 
llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-combiner-sameopcode-hands-crash.mir
 (+2) 
- (modified) 
llvm/test/CodeGen/AArch64/GlobalISel/postlegalizer-lowering-swap-compare-operands.mir
 (+12-4) 
- (modified) llvm/test/CodeGen/AArch64/setcc_knownbits.ll (+1-2) 


``diff
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
index f71fe323a6d35..28d9f4f50f388 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerCombiner.cpp
@@ -566,6 +566,11 @@ bool 
AArch64PostLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
  /*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
  F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // Legalizer performs DCE, so a full DCE pass is unnecessary.
+  CInfo.EnableFullDCE = false;
   AArch64PostLegalizerCombinerImpl Impl(MF, CInfo, TPC, *KB, CSEInfo,
 RuleConfig, ST, MDT, LI);
   bool Changed = Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
index 4a1977ba1a00f..56e9e4d76eeed 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PostLegalizerLowering.cpp
@@ -1290,6 +1290,11 @@ bool 
AArch64PostLegalizerLowering::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
  /*LegalizerInfo*/ nullptr, /*OptEnabled=*/true,
  F.hasOptSize(), F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // PostLegalizerCombiner performs DCE, so a full DCE pass is unnecessary.
+  CInfo.EnableFullDCE = false;
   AArch64PostLegalizerLoweringImpl Impl(MF, CInfo, TPC, /*CSEInfo*/ nullptr,
 RuleConfig, ST);
   return Impl.combineMachineInstrs();
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp 
b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
index 8a50cb26b2c2f..6e689d743804a 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64PreLegalizerCombiner.cpp
@@ -861,6 +861,12 @@ bool 
AArch64PreLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
   CombinerInfo CInfo(/*AllowIllegalOps*/ true, /*ShouldLegalizeIllegal*/ false,
  /*LegalizerInfo*/ nullptr, EnableOpt, F.hasOptSize(),
  F.hasMinSize());
+  // Disable fixed-point iteration to reduce compile-time
+  CInfo.MaxIterations = 1;
+  CInfo.ObserverLvl = CombinerInfo::ObserverLevel::SinglePass;
+  // This is the first Combiner, so the input IR might contain dead
+  // instructions.
+  CInfo.EnableFullDCE = true;
   AArch64PreLegalizerCombinerImpl Impl(MF, CInfo, &TPC, *KB, CSEInfo,
RuleConfig, ST, MDT, LI);
   return Impl.combineMachineInstrs();
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir 
b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
index 1eb445c03efcd..d5356711585fa 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-logic-of-compare.mir
@@ -203,8 +203,8 @@ body: |
 ; CHECK-LABEL: name: test_icmp_and_icmp_9_2
 ; CHECK: liveins: $x0, $x1
 ; CHECK-NEXT: {{  $}}
-; CHECK-NEXT: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
-; CHECK-NEXT: $x0 = COPY [[C]](s64)
+; CHECK-NEXT: %zext:_(s64) = G_CON

[llvm-branch-commits] [llvm] TTI: Check legalization cost of mulfix ISD nodes (PR #100520)

2024-08-06 Thread Simon Pilgrim via llvm-branch-commits

https://github.com/RKSimon approved this pull request.

LGTM - although I don't think we have any legal/custom cost test coverage (only 
x86 which exapands)

https://github.com/llvm/llvm-project/pull/100520
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [lldb] release/19.x: [LLDB] Add `` to AddressableBits (#102110) (PR #102112)

2024-08-06 Thread Jonas Devlieghere via llvm-branch-commits

https://github.com/JDevlieghere approved this pull request.


https://github.com/llvm/llvm-project/pull/102112
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] TTI: Check legalization cost of mul overflow ISD nodes (PR #100519)

2024-08-06 Thread Simon Pilgrim via llvm-branch-commits

https://github.com/RKSimon approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/100519
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

tobias-stadler wrote:

CTMark O0:
```
Program   compile_instructions  
  size..text
  base-O0  patch-O0 
   diff   base-O0patch-O0   diff
7zip/7zip-benchmark   141676688007.00  
141409424579.00 -0.19% 1004756.00 1004756.00 0.00%
Bullet/bullet  61212623785.00   
61017628199.00 -0.32%  873356.00  873356.00 0.00%
SPASS/SPASS14400540825.00   
14348447793.00 -0.36%  645136.00  645136.00 0.00%
tramp3d-v4/tramp3d-v4  18450535356.00   
18380583241.00 -0.38%  894656.00  894656.00 0.00%
kimwitu++/kc   22538775759.00   
22426150589.00 -0.50%  813492.00  813492.00 0.00%
ClamAV/clamscan14536543546.00   
14378298132.00 -1.09%  702832.00  702832.00 0.00%
lencod/lencod  12679661465.00   
12525031047.00 -1.22%  820564.00  820564.00 0.00%
mafft/pairlocalalign6940116013.00
6851372697.00 -1.28%  423000.00  423000.00 0.00%
consumer-typeset/consumer-typeset  12193354451.00   
12003246725.00 -1.56%  908844.00  908844.00 0.00%
sqlite3/sqlite3 4596588654.00
4512822506.00 -1.82%  459904.00  459904.00 0.00%
   Geomean difference   
   -0.87%   0.00%

```

CTMark O2:
```
Program   compile_instructions  
  size..text
  base-O2  patch-O2 
   diff   base-O2patch-O2  diff
7zip/7zip-benchmark   207008981549.00  
205988150072.00 -0.49% 824088.00  824088.00 0.00%
Bullet/bullet  99712646588.00   
99179976524.00 -0.53% 515864.00  515864.00 0.00%
SPASS/SPASS43192888774.00   
42858161952.00 -0.77% 443056.00  443056.00 0.00%
kimwitu++/kc   41563609655.00   
41217223921.00 -0.83% 461284.00  461284.00 0.00%
tramp3d-v4/tramp3d-v4  65778761906.00   
6520869.00 -0.87% 570560.00  570560.00 0.00%
lencod/lencod  57111009799.00   
56497616800.00 -1.07% 541960.00  541960.00 0.00%
ClamAV/clamscan53935475581.00   
53350116553.00 -1.09% 456352.00  456352.00 0.00%
mafft/pairlocalalign   32799063008.00   
32439200934.00 -1.10% 320888.00  320888.00 0.00%
consumer-typeset/consumer-typeset  35357798970.00   
34865643450.00 -1.39% 419436.00  419436.00 0.00%
sqlite3/sqlite335219936375.00   
34678565893.00 -1.54% 436160.00  436160.00 0.00%
   Geomean difference   
   -0.97%  0.00%

```

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102168

Backport a98a0dc

Requested by: @hazzlim

>From e572643c71072b6038298b25b09a6e9cab71f9b3 Mon Sep 17 00:00:00 2001
From: Hari Limaye 
Date: Tue, 6 Aug 2024 11:39:01 +0100
Subject: [PATCH] [AArch64] Add streaming-mode stack hazard optimization
 remarks (#101695)

Emit an optimization remark when objects in the stack frame may cause
hazards in a streaming mode function. The analysis requires either the
`aarch64-stack-hazard-size` or `aarch64-stack-hazard-remark-size` flag
to be set by the user, with the former flag taking precedence.

(cherry picked from commit a98a0dcf63f54c54c5601a34c9f8c10cde0162d6)
---
 .../llvm/CodeGen/TargetFrameLowering.h|   6 +
 llvm/lib/CodeGen/PrologEpilogInserter.cpp |   3 +
 .../Target/AArch64/AArch64FrameLowering.cpp   | 204 +-
 .../lib/Target/AArch64/AArch64FrameLowering.h |   6 +-
 .../AArch64/ssve-stack-hazard-remarks.ll  | 152 +
 .../CodeGen/AArch64/sve-stack-frame-layout.ll |   4 +-
 6 files changed, 364 insertions(+), 11 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll

diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h 
b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0656c0d739fdf..d8c9d0a432ad8 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -15,6 +15,7 @@
 
 #include "llvm/ADT/BitVector.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
 #include "llvm/Support/TypeSize.h"
 #include 
 
@@ -473,6 +474,11 @@ class TargetFrameLowering {
   /// Return the frame base information to be encoded in the DWARF subprogram
   /// debug info.
   virtual DwarfFrameBase getDwarfFrameBase(const MachineFunction &MF) const;
+
+  /// This method is called at the end of prolog/epilog code insertion, so
+  /// targets can emit remarks based on the final frame layout.
+  virtual void emitRemarks(const MachineFunction &MF,
+   MachineOptimizationRemarkEmitter *ORE) const {};
 };
 
 } // End llvm namespace
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp 
b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index cd5d877e53d82..f4490873cfdcd 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -341,6 +341,9 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
<< ore::NV("Function", MF.getFunction().getName()) << "'";
   });
 
+  // Emit any remarks implemented for the target, based on final frame layout.
+  TFI->emitRemarks(MF, ORE);
+
   delete RS;
   SaveBlocks.clear();
   RestoreBlocks.clear();
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index bd530903bb664..ba46ededc63a8 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -240,6 +240,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
@@ -275,6 +276,10 @@ cl::opt EnableHomogeneousPrologEpilog(
 // Stack hazard padding size. 0 = disabled.
 static cl::opt StackHazardSize("aarch64-stack-hazard-size",
  cl::init(0), cl::Hidden);
+// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
+static cl::opt
+StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
+  cl::Hidden);
 // Whether to insert padding into non-streaming functions (for testing).
 static cl::opt
 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
@@ -2615,9 +2620,16 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const 
MachineFunction &MF,
   const auto &MFI = MF.getFrameInfo();
 
   int64_t ObjectOffset = MFI.getObjectOffset(FI);
+  StackOffset SVEStackSize = getSVEStackSize(MF);
+
+  // For VLA-area objects, just emit an offset at the end of the stack frame.
+  // Whilst not quite correct, these objects do live at the end of the frame 
and
+  // so it is more useful for analysis for the offset to reflect this.
+  if (MFI.isVariableSizedObjectIndex(FI)) {
+return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - 
SVEStackSize;
+  }
 
   // This is correct in the absence of any SVE stack objects.
-  StackOffset SVEStackSize = getSVEStackSize(MF);
   if (!SVEStackSize)
 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
 
@@ -3528,13 +3540,9 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
   return true;
 }
 
-// Return the FrameID for a Load/Store instruction by looking at the MMO.
-static std::optional getLdStFrameID(const MachineInstr &MI,
-  

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102168
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:

@davemgreen What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102168
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: None (llvmbot)


Changes

Backport a98a0dc

Requested by: @hazzlim

---

Patch is 21.36 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/102168.diff


6 Files Affected:

- (modified) llvm/include/llvm/CodeGen/TargetFrameLowering.h (+6) 
- (modified) llvm/lib/CodeGen/PrologEpilogInserter.cpp (+3) 
- (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (+196-8) 
- (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.h (+5-1) 
- (added) llvm/test/CodeGen/AArch64/ssve-stack-hazard-remarks.ll (+152) 
- (modified) llvm/test/CodeGen/AArch64/sve-stack-frame-layout.ll (+2-2) 


``diff
diff --git a/llvm/include/llvm/CodeGen/TargetFrameLowering.h 
b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
index 0656c0d739fdf..d8c9d0a432ad8 100644
--- a/llvm/include/llvm/CodeGen/TargetFrameLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetFrameLowering.h
@@ -15,6 +15,7 @@
 
 #include "llvm/ADT/BitVector.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineOptimizationRemarkEmitter.h"
 #include "llvm/Support/TypeSize.h"
 #include 
 
@@ -473,6 +474,11 @@ class TargetFrameLowering {
   /// Return the frame base information to be encoded in the DWARF subprogram
   /// debug info.
   virtual DwarfFrameBase getDwarfFrameBase(const MachineFunction &MF) const;
+
+  /// This method is called at the end of prolog/epilog code insertion, so
+  /// targets can emit remarks based on the final frame layout.
+  virtual void emitRemarks(const MachineFunction &MF,
+   MachineOptimizationRemarkEmitter *ORE) const {};
 };
 
 } // End llvm namespace
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp 
b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index cd5d877e53d82..f4490873cfdcd 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -341,6 +341,9 @@ bool PEI::runOnMachineFunction(MachineFunction &MF) {
<< ore::NV("Function", MF.getFunction().getName()) << "'";
   });
 
+  // Emit any remarks implemented for the target, based on final frame layout.
+  TFI->emitRemarks(MF, ORE);
+
   delete RS;
   SaveBlocks.clear();
   RestoreBlocks.clear();
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index bd530903bb664..ba46ededc63a8 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -240,6 +240,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetMachine.h"
@@ -275,6 +276,10 @@ cl::opt EnableHomogeneousPrologEpilog(
 // Stack hazard padding size. 0 = disabled.
 static cl::opt StackHazardSize("aarch64-stack-hazard-size",
  cl::init(0), cl::Hidden);
+// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
+static cl::opt
+StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
+  cl::Hidden);
 // Whether to insert padding into non-streaming functions (for testing).
 static cl::opt
 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
@@ -2615,9 +2620,16 @@ AArch64FrameLowering::getFrameIndexReferenceFromSP(const 
MachineFunction &MF,
   const auto &MFI = MF.getFrameInfo();
 
   int64_t ObjectOffset = MFI.getObjectOffset(FI);
+  StackOffset SVEStackSize = getSVEStackSize(MF);
+
+  // For VLA-area objects, just emit an offset at the end of the stack frame.
+  // Whilst not quite correct, these objects do live at the end of the frame 
and
+  // so it is more useful for analysis for the offset to reflect this.
+  if (MFI.isVariableSizedObjectIndex(FI)) {
+return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - 
SVEStackSize;
+  }
 
   // This is correct in the absence of any SVE stack objects.
-  StackOffset SVEStackSize = getSVEStackSize(MF);
   if (!SVEStackSize)
 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
 
@@ -3528,13 +3540,9 @@ bool AArch64FrameLowering::restoreCalleeSavedRegisters(
   return true;
 }
 
-// Return the FrameID for a Load/Store instruction by looking at the MMO.
-static std::optional getLdStFrameID(const MachineInstr &MI,
- const MachineFrameInfo &MFI) {
-  if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
-return std::nullopt;
-
-  MachineMemOperand *MMO = *MI.memoperands_begin();
+// Return the FrameID for a MMO.
+static std::optional getMMOFrameID(MachineMemOperand *MMO,
+const MachineFrameInfo &MFI) {
   auto *PSV =
   dyn_cast_or_null(MMO->getPseudoValue());
   if (PSV)
@@ -3552,6 +3560,15 @@ static std::opti

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread Hari Limaye via llvm-branch-commits

hazzlim wrote:

This is useful for users in conjunction with 
https://github.com/llvm/llvm-project/commit/4b9bcabdf05346fd72db0d1ad88faa9b969a8f13
 but requires them to be using the same compiler (so that spills occur in the 
same places).

It should be safe as it is very much opt-in via `-aarch64-stack-hazard-size` / 
`-aarch64-stack-hazard-remark-size` flags.

https://github.com/llvm/llvm-project/pull/102168
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits


@@ -257,10 +257,10 @@ define i32 @udiv_div_by_180(i32 %x)
 ;
 ; GISEL-LABEL: udiv_div_by_180:
 ; GISEL:   // %bb.0:
-; GISEL-NEXT:uxtb w8, w0
-; GISEL-NEXT:mov w9, #5826 // =0x16c2
-; GISEL-NEXT:movk w9, #364, lsl #16
-; GISEL-NEXT:umull x8, w8, w9
+; GISEL-NEXT:mov w8, #5826 // =0x16c2
+; GISEL-NEXT:and w9, w0, #0xff
+; GISEL-NEXT:movk w8, #364, lsl #16
+; GISEL-NEXT:umull x8, w9, w8

tobias-stadler wrote:

expanding the div generates a bunch of instructions and the order in which they 
are recombined results in different ouput

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits


@@ -9,6 +9,8 @@ body: |
   bb.1:
 ; CHECK-LABEL: name: crash_fn
 ; CHECK: [[C:%[0-9]+]]:_(p0) = G_CONSTANT i64 0
+; CHECK-NEXT: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
+; CHECK-NEXT: [[C2:%[0-9]+]]:_(s1) = G_CONSTANT i1 false

tobias-stadler wrote:

These are dead in the input IR, which doesn't happen inside the CodeGen 
pipeline.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits


@@ -88,7 +95,8 @@ body: |
 %cmp2:_(s32) = G_ICMP intpred(sge), %cmp_lhs(s64), %add
 
 $w0 = COPY %cmp2(s32)
-RET_ReallyLR implicit $w0
+$w1 = COPY %cmp1(s32)

tobias-stadler wrote:

cmp1 was dead in the input IR, but according to the comment this seems 
unintended.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits


@@ -33,8 +33,7 @@ define noundef i1 @logger(i32 noundef %logLevel, ptr %ea, ptr 
%pll) {
 ; CHECK-GI-NEXT:b.hi .LBB1_2
 ; CHECK-GI-NEXT:  // %bb.1: // %land.rhs
 ; CHECK-GI-NEXT:ldr x8, [x1]
-; CHECK-GI-NEXT:ldrb w8, [x8]
-; CHECK-GI-NEXT:and w0, w8, #0x1
+; CHECK-GI-NEXT:ldrb w0, [x8]

tobias-stadler wrote:

The immediate DCEing and better ordering of combines prevents the 
PreLegalizerCombiner from generating a bunch of useless artifacts that it can't 
combine away again. These artifacts are converted into G_AND by the 
ArtifactCombiner, which can't be combined away by the redundant_and combine, 
because KnownBits can't look through the implicit anyext of the G_LOAD. We are 
probably missing some combines that convert the load into sext/zext versions.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [mlir] [OpenMP]Update use_device_clause lowering (PR #101707)

2024-08-06 Thread Akash Banerjee via llvm-branch-commits

https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/101707

>From 9db19516d50cd4f6a597fbd811419af98859315a Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Fri, 2 Aug 2024 17:11:21 +0100
Subject: [PATCH] [OpenMP]Update use_device_clause lowering

This patch updates the use_device_ptr and use_device_addr clauses to use the 
mapInfoOps for lowering. This allows all the types that are handle by the map 
clauses such as derived types to also be supported by the use_device_clauses.

This is patch 2/2 in a series of patches.
---
 llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp |   2 +-
 .../OpenMP/OpenMPToLLVMIRTranslation.cpp  | 284 ++
 mlir/test/Target/LLVMIR/omptarget-llvm.mlir   |  16 +-
 .../openmp-target-use-device-nested.mlir  |  27 ++
 4 files changed, 194 insertions(+), 135 deletions(-)
 create mode 100644 mlir/test/Target/LLVMIR/openmp-target-use-device-nested.mlir

diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp 
b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index afbb9f9cc16430..4793711a986e97 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -6351,7 +6351,7 @@ OpenMPIRBuilder::InsertPointTy 
OpenMPIRBuilder::createTargetData(
   // Disable TargetData CodeGen on Device pass.
   if (Config.IsTargetDevice.value_or(false)) {
 if (BodyGenCB)
-  Builder.restoreIP(BodyGenCB(Builder.saveIP(), BodyGenTy::NoPriv));
+  Builder.restoreIP(BodyGenCB(CodeGenIP, BodyGenTy::NoPriv));
 return Builder.saveIP();
   }
 
diff --git 
a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp 
b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 458d05d5059db7..78c460c50cbe5e 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -2110,6 +2110,8 @@ getRefPtrIfDeclareTarget(mlir::Value value,
 struct MapInfoData : llvm::OpenMPIRBuilder::MapInfosTy {
   llvm::SmallVector IsDeclareTarget;
   llvm::SmallVector IsAMember;
+  // Identify if mapping was added by mapClause or use_device clauses.
+  llvm::SmallVector IsAMapping;
   llvm::SmallVector MapClause;
   llvm::SmallVector OriginalValue;
   // Stripped off array/pointer to get the underlying
@@ -2193,62 +2195,125 @@ llvm::Value *getSizeInBytes(DataLayout &dl, const 
mlir::Type &type,
   return builder.getInt64(dl.getTypeSizeInBits(type) / 8);
 }
 
-void collectMapDataFromMapVars(MapInfoData &mapData,
-   llvm::SmallVectorImpl &mapVars,
-   LLVM::ModuleTranslation &moduleTranslation,
-   DataLayout &dl, llvm::IRBuilderBase &builder) {
+void collectMapDataFromMapOperands(
+MapInfoData &mapData, llvm::SmallVectorImpl &mapVars,
+LLVM::ModuleTranslation &moduleTranslation, DataLayout &dl,
+llvm::IRBuilderBase &builder,
+const llvm::ArrayRef &useDevPtrOperands = {},
+const llvm::ArrayRef &useDevAddrOperands = {}) {
+  // Process MapOperands
   for (mlir::Value mapValue : mapVars) {
-if (auto mapOp = mlir::dyn_cast_if_present(
-mapValue.getDefiningOp())) {
-  mlir::Value offloadPtr =
-  mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
-  mapData.OriginalValue.push_back(
-  moduleTranslation.lookupValue(offloadPtr));
-  mapData.Pointers.push_back(mapData.OriginalValue.back());
-
-  if (llvm::Value *refPtr =
-  getRefPtrIfDeclareTarget(offloadPtr,
-   moduleTranslation)) { // declare target
-mapData.IsDeclareTarget.push_back(true);
-mapData.BasePointers.push_back(refPtr);
-  } else { // regular mapped variable
-mapData.IsDeclareTarget.push_back(false);
-mapData.BasePointers.push_back(mapData.OriginalValue.back());
-  }
+auto mapOp = mlir::cast(mapValue.getDefiningOp());
+mlir::Value offloadPtr =
+mapOp.getVarPtrPtr() ? mapOp.getVarPtrPtr() : mapOp.getVarPtr();
+mapData.OriginalValue.push_back(moduleTranslation.lookupValue(offloadPtr));
+mapData.Pointers.push_back(mapData.OriginalValue.back());
+
+if (llvm::Value *refPtr =
+getRefPtrIfDeclareTarget(offloadPtr,
+ moduleTranslation)) { // declare target
+  mapData.IsDeclareTarget.push_back(true);
+  mapData.BasePointers.push_back(refPtr);
+} else { // regular mapped variable
+  mapData.IsDeclareTarget.push_back(false);
+  mapData.BasePointers.push_back(mapData.OriginalValue.back());
+}
 
-  mapData.BaseType.push_back(
-  moduleTranslation.convertType(mapOp.getVarType()));
-  mapData.Sizes.push_back(
-  getSizeInBytes(dl, mapOp.getVarType(), mapOp, 
mapData.Pointers.back(),
- mapData.BaseType.back(), builder, moduleTranslation));
-  mapData.MapClause.push_back(mapOp.getOpera

[llvm-branch-commits] [llvm] release/19.x: [Hexagon] Do not optimize address of another function's block (#101209) (PR #102179)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102179
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [Hexagon] Do not optimize address of another function's block (#101209) (PR #102179)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102179

Backport 68df06a0b2998765cb0a41353fcf0919bbf57ddb

Requested by: @yandalur

>From 56ed15517a94f797a0a71029280c9cf0c10e4bf3 Mon Sep 17 00:00:00 2001
From: yandalur 
Date: Thu, 1 Aug 2024 21:37:23 +0530
Subject: [PATCH] [Hexagon] Do not optimize address of another function's block
 (#101209)

When the constant extender optimization pass encounters an instruction
that uses an extended address pointing to another function's block,
avoid adding the instruction to the extender list for the current
machine function.

Fixes https://github.com/llvm/llvm-project/issues/99714

(cherry picked from commit 68df06a0b2998765cb0a41353fcf0919bbf57ddb)
---
 .../Target/Hexagon/HexagonConstExtenders.cpp  |   4 +
 .../CodeGen/Hexagon/cext-opt-block-addr.mir   | 173 ++
 2 files changed, 177 insertions(+)
 create mode 100644 llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir

diff --git a/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp 
b/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
index f0933765bbcbd..86ce6b4e05ed2 100644
--- a/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
@@ -1223,6 +1223,10 @@ void HCE::recordExtender(MachineInstr &MI, unsigned 
OpNum) {
   if (ER.Kind == MachineOperand::MO_GlobalAddress)
 if (ER.V.GV->getName().empty())
   return;
+  // Ignore block address that points to block in another function
+  if (ER.Kind == MachineOperand::MO_BlockAddress)
+if (ER.V.BA->getFunction() != &(MI.getMF()->getFunction()))
+  return;
   Extenders.push_back(ED);
 }
 
diff --git a/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir 
b/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir
new file mode 100644
index 0..9f140132dcd6c
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir
@@ -0,0 +1,173 @@
+# REQUIRES: asserts
+# RUN: llc -march=hexagon -run-pass hexagon-cext-opt %s -o - | FileCheck %s
+
+# Check that the HexagonConstantExtenders pass does not assert when block
+# addresses from different functions are used
+# CHECK-LABEL: name: wibble
+# CHECK: A2_tfrsi blockaddress(@baz
+# CHECK: A2_tfrsi blockaddress(@wibble
+
+--- |
+  target triple = "hexagon"
+
+  define dso_local void @baz() {
+  bb:
+br label %bb1
+
+  bb1:  ; preds = %bb
+%call = tail call fastcc i32 @wibble(i32 poison)
+ret void
+  }
+
+  define internal fastcc i32 @wibble(i32 %arg) {
+  bb:
+%call = tail call i32 @eggs(i32 noundef ptrtoint (ptr blockaddress(@baz, 
%bb1) to i32))
+br label %bb1
+
+  bb1:  ; preds = %bb
+tail call void @baz.1(i32 noundef ptrtoint (ptr blockaddress(@wibble, 
%bb1) to i32))
+ret i32 %call
+  }
+
+  declare i32 @eggs(i32 noundef) local_unnamed_addr
+
+  declare void @baz.1(i32 noundef) local_unnamed_addr
+
+...
+---
+name:baz
+alignment:   16
+exposesReturnsTwice: false
+legalized:   false
+regBankSelected: false
+selected:false
+failedISel:  false
+tracksRegLiveness: true
+hasWinCFI:   false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes: false
+hasEHFunclets:   false
+isOutlined:  false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: intregs, preferred-register: '' }
+liveins: []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap: false
+  hasPatchPoint:   false
+  stackSize:   0
+  offsetAdjustment: 0
+  maxAlignment:1
+  adjustsStack:false
+  hasCalls:false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:  false
+  hasMustTailInVarArgFunc: false
+  hasTailCall: true
+  isCalleeSavedInfoValid: false
+  localFrameSize:  0
+  savePoint:   ''
+  restorePoint:''
+fixedStack:  []
+stack:   []
+entry_values:[]
+callSites:   []
+debugValueSubstitutions: []
+constants:   []
+machineFunctionInfo: {}
+body: |
+  bb.0.bb:
+successors: %bb.1(0x8000)
+
+  bb.1.bb1 (ir-block-address-taken %ir-block.bb1):
+%0:intregs = IMPLICIT_DEF
+$r0 = COPY %0
+PS_tailcall_i @wibble, hexagoncsr, implicit $r0
+
+...
+---
+name:wibble
+alignment:   16
+exposesReturnsTwice: false
+legalized:   false
+regBankSelected: false
+selected:false
+failedISel:  false
+tracksRegLiveness: true
+hasWinCFI:   false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes: false
+hasEHFunclets:   false
+isOutlined:  false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: intregs, preferred-register: '' }
+  - { id: 1, class: 

[llvm-branch-commits] [llvm] release/19.x: [Hexagon] Do not optimize address of another function's block (#101209) (PR #102179)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:

@SundeepKushwaha What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102179
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [Hexagon] Do not optimize address of another function's block (#101209) (PR #102179)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-hexagon

Author: None (llvmbot)


Changes

Backport 68df06a0b2998765cb0a41353fcf0919bbf57ddb

Requested by: @yandalur

---
Full diff: https://github.com/llvm/llvm-project/pull/102179.diff


2 Files Affected:

- (modified) llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp (+4) 
- (added) llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir (+173) 


``diff
diff --git a/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp 
b/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
index f0933765bbcbd..86ce6b4e05ed2 100644
--- a/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonConstExtenders.cpp
@@ -1223,6 +1223,10 @@ void HCE::recordExtender(MachineInstr &MI, unsigned 
OpNum) {
   if (ER.Kind == MachineOperand::MO_GlobalAddress)
 if (ER.V.GV->getName().empty())
   return;
+  // Ignore block address that points to block in another function
+  if (ER.Kind == MachineOperand::MO_BlockAddress)
+if (ER.V.BA->getFunction() != &(MI.getMF()->getFunction()))
+  return;
   Extenders.push_back(ED);
 }
 
diff --git a/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir 
b/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir
new file mode 100644
index 0..9f140132dcd6c
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/cext-opt-block-addr.mir
@@ -0,0 +1,173 @@
+# REQUIRES: asserts
+# RUN: llc -march=hexagon -run-pass hexagon-cext-opt %s -o - | FileCheck %s
+
+# Check that the HexagonConstantExtenders pass does not assert when block
+# addresses from different functions are used
+# CHECK-LABEL: name: wibble
+# CHECK: A2_tfrsi blockaddress(@baz
+# CHECK: A2_tfrsi blockaddress(@wibble
+
+--- |
+  target triple = "hexagon"
+
+  define dso_local void @baz() {
+  bb:
+br label %bb1
+
+  bb1:  ; preds = %bb
+%call = tail call fastcc i32 @wibble(i32 poison)
+ret void
+  }
+
+  define internal fastcc i32 @wibble(i32 %arg) {
+  bb:
+%call = tail call i32 @eggs(i32 noundef ptrtoint (ptr blockaddress(@baz, 
%bb1) to i32))
+br label %bb1
+
+  bb1:  ; preds = %bb
+tail call void @baz.1(i32 noundef ptrtoint (ptr blockaddress(@wibble, 
%bb1) to i32))
+ret i32 %call
+  }
+
+  declare i32 @eggs(i32 noundef) local_unnamed_addr
+
+  declare void @baz.1(i32 noundef) local_unnamed_addr
+
+...
+---
+name:baz
+alignment:   16
+exposesReturnsTwice: false
+legalized:   false
+regBankSelected: false
+selected:false
+failedISel:  false
+tracksRegLiveness: true
+hasWinCFI:   false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes: false
+hasEHFunclets:   false
+isOutlined:  false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: intregs, preferred-register: '' }
+liveins: []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap: false
+  hasPatchPoint:   false
+  stackSize:   0
+  offsetAdjustment: 0
+  maxAlignment:1
+  adjustsStack:false
+  hasCalls:false
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCalleeSavedRegisters: 0
+  hasOpaqueSPAdjustment: false
+  hasVAStart:  false
+  hasMustTailInVarArgFunc: false
+  hasTailCall: true
+  isCalleeSavedInfoValid: false
+  localFrameSize:  0
+  savePoint:   ''
+  restorePoint:''
+fixedStack:  []
+stack:   []
+entry_values:[]
+callSites:   []
+debugValueSubstitutions: []
+constants:   []
+machineFunctionInfo: {}
+body: |
+  bb.0.bb:
+successors: %bb.1(0x8000)
+
+  bb.1.bb1 (ir-block-address-taken %ir-block.bb1):
+%0:intregs = IMPLICIT_DEF
+$r0 = COPY %0
+PS_tailcall_i @wibble, hexagoncsr, implicit $r0
+
+...
+---
+name:wibble
+alignment:   16
+exposesReturnsTwice: false
+legalized:   false
+regBankSelected: false
+selected:false
+failedISel:  false
+tracksRegLiveness: true
+hasWinCFI:   false
+callsEHReturn:   false
+callsUnwindInit: false
+hasEHCatchret:   false
+hasEHScopes: false
+hasEHFunclets:   false
+isOutlined:  false
+debugInstrRef:   false
+failsVerification: false
+tracksDebugUserValues: false
+registers:
+  - { id: 0, class: intregs, preferred-register: '' }
+  - { id: 1, class: intregs, preferred-register: '' }
+  - { id: 2, class: intregs, preferred-register: '' }
+  - { id: 3, class: intregs, preferred-register: '' }
+  - { id: 4, class: intregs, preferred-register: '' }
+liveins: []
+frameInfo:
+  isFrameAddressTaken: false
+  isReturnAddressTaken: false
+  hasStackMap: false
+  hasPatchPoint:   false
+  stackSize:   0
+  offsetAdjustment: 0
+  maxAlignment:1
+  adjustsStack:true
+  hasCalls:true
+  stackProtector:  ''
+  functionContext: ''
+  maxCallFrameSize: 4294967295
+  cvBytesOfCal

[llvm-branch-commits] [llvm] AMDGPU/GlobalISel: Fix isExtractHiElt when selecting fma_mix (PR #102130)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/102130
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Thorsten Schütt via llvm-branch-commits

tschuett wrote:

Is this a fundamental issue of the combiner or do we have to revisit the 
situation when the combiner becomes slowly more powerful?

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Don't remove from unfinalized GISelWorkList (PR #102158)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/102158
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] TTI: Check legalization cost of mul overflow ISD nodes (PR #100519)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100519

>From fd7e3162be6e57d62942759b10e9a3e192af2d18 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 25 Jul 2024 10:27:54 +0400
Subject: [PATCH] TTI: Check legalization cost of mul overflow ISD nodes

---
 llvm/include/llvm/CodeGen/BasicTTIImpl.h | 67 +---
 1 file changed, 36 insertions(+), 31 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index ac3968de5d672..bdf53abe9533d 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2196,37 +2196,11 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   ISD = ISD::USUBO;
   break;
 case Intrinsic::smul_with_overflow:
-case Intrinsic::umul_with_overflow: {
-  Type *MulTy = RetTy->getContainedType(0);
-  Type *OverflowTy = RetTy->getContainedType(1);
-  unsigned ExtSize = MulTy->getScalarSizeInBits() * 2;
-  Type *ExtTy = MulTy->getWithNewBitWidth(ExtSize);
-  bool IsSigned = IID == Intrinsic::smul_with_overflow;
-
-  unsigned ExtOp = IsSigned ? Instruction::SExt : Instruction::ZExt;
-  TTI::CastContextHint CCH = TTI::CastContextHint::None;
-
-  InstructionCost Cost = 0;
-  Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, MulTy, CCH, 
CostKind);
-  Cost +=
-  thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
-  Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, MulTy, ExtTy,
-CCH, CostKind);
-  Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, ExtTy,
-  CostKind,
-  {TTI::OK_AnyValue, TTI::OP_None},
-  {TTI::OK_UniformConstantValue, 
TTI::OP_None});
-
-  if (IsSigned)
-Cost += thisT()->getArithmeticInstrCost(Instruction::AShr, MulTy,
-CostKind,
-{TTI::OK_AnyValue, 
TTI::OP_None},
-{TTI::OK_UniformConstantValue, 
TTI::OP_None});
-
-  Cost += thisT()->getCmpSelInstrCost(
-  BinaryOperator::ICmp, MulTy, OverflowTy, CmpInst::ICMP_NE, CostKind);
-  return Cost;
-}
+  ISD = ISD::SMULO;
+  break;
+case Intrinsic::umul_with_overflow:
+  ISD = ISD::UMULO;
+  break;
 case Intrinsic::fptosi_sat:
 case Intrinsic::fptoui_sat: {
   if (Tys.empty())
@@ -2371,6 +2345,37 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   OverflowTy, Pred, CostKind);
   return Cost;
 }
+case Intrinsic::smul_with_overflow:
+case Intrinsic::umul_with_overflow: {
+  Type *MulTy = RetTy->getContainedType(0);
+  Type *OverflowTy = RetTy->getContainedType(1);
+  unsigned ExtSize = MulTy->getScalarSizeInBits() * 2;
+  Type *ExtTy = MulTy->getWithNewBitWidth(ExtSize);
+  bool IsSigned = IID == Intrinsic::smul_with_overflow;
+
+  unsigned ExtOp = IsSigned ? Instruction::SExt : Instruction::ZExt;
+  TTI::CastContextHint CCH = TTI::CastContextHint::None;
+
+  InstructionCost Cost = 0;
+  Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, MulTy, CCH, 
CostKind);
+  Cost +=
+  thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
+  Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, MulTy, ExtTy,
+CCH, CostKind);
+  Cost += thisT()->getArithmeticInstrCost(
+  Instruction::LShr, ExtTy, CostKind, {TTI::OK_AnyValue, TTI::OP_None},
+  {TTI::OK_UniformConstantValue, TTI::OP_None});
+
+  if (IsSigned)
+Cost += thisT()->getArithmeticInstrCost(
+Instruction::AShr, MulTy, CostKind,
+{TTI::OK_AnyValue, TTI::OP_None},
+{TTI::OK_UniformConstantValue, TTI::OP_None});
+
+  Cost += thisT()->getCmpSelInstrCost(
+  BinaryOperator::ICmp, MulTy, OverflowTy, CmpInst::ICMP_NE, CostKind);
+  return Cost;
+}
 case Intrinsic::sadd_sat:
 case Intrinsic::ssub_sat: {
   // Assume a default expansion.

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] TTI: Check legalization cost of mulfix ISD nodes (PR #100520)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100520

>From 05c9703c68a33729e598862f5b2de37e2d6a453f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 25 Jul 2024 10:31:04 +0400
Subject: [PATCH] TTI: Check legalization cost of mulfix ISD nodes

---
 llvm/include/llvm/CodeGen/BasicTTIImpl.h | 53 +---
 1 file changed, 29 insertions(+), 24 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index bdf53abe9533d..890c2b8ca36e1 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2159,30 +2159,11 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   ISD = ISD::USUBSAT;
   break;
 case Intrinsic::smul_fix:
-case Intrinsic::umul_fix: {
-  unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;
-  Type *ExtTy = RetTy->getWithNewBitWidth(ExtSize);
-
-  unsigned ExtOp =
-  IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;
-  TTI::CastContextHint CCH = TTI::CastContextHint::None;
-
-  InstructionCost Cost = 0;
-  Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, RetTy, CCH, 
CostKind);
-  Cost +=
-  thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
-  Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, RetTy, ExtTy,
-CCH, CostKind);
-  Cost += thisT()->getArithmeticInstrCost(Instruction::LShr, RetTy,
-  CostKind,
-  {TTI::OK_AnyValue, TTI::OP_None},
-  {TTI::OK_UniformConstantValue, 
TTI::OP_None});
-  Cost += thisT()->getArithmeticInstrCost(Instruction::Shl, RetTy, 
CostKind,
-  {TTI::OK_AnyValue, TTI::OP_None},
-  {TTI::OK_UniformConstantValue, 
TTI::OP_None});
-  Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, 
CostKind);
-  return Cost;
-}
+  ISD = ISD::SMULFIX;
+  break;
+case Intrinsic::umul_fix:
+  ISD = ISD::UMULFIX;
+  break;
 case Intrinsic::sadd_with_overflow:
   ISD = ISD::SADDO;
   break;
@@ -2417,6 +2398,30 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   CmpInst::BAD_ICMP_PREDICATE, CostKind);
   return Cost;
 }
+case Intrinsic::smul_fix:
+case Intrinsic::umul_fix: {
+  unsigned ExtSize = RetTy->getScalarSizeInBits() * 2;
+  Type *ExtTy = RetTy->getWithNewBitWidth(ExtSize);
+
+  unsigned ExtOp =
+  IID == Intrinsic::smul_fix ? Instruction::SExt : Instruction::ZExt;
+  TTI::CastContextHint CCH = TTI::CastContextHint::None;
+
+  InstructionCost Cost = 0;
+  Cost += 2 * thisT()->getCastInstrCost(ExtOp, ExtTy, RetTy, CCH, 
CostKind);
+  Cost +=
+  thisT()->getArithmeticInstrCost(Instruction::Mul, ExtTy, CostKind);
+  Cost += 2 * thisT()->getCastInstrCost(Instruction::Trunc, RetTy, ExtTy,
+CCH, CostKind);
+  Cost += thisT()->getArithmeticInstrCost(
+  Instruction::LShr, RetTy, CostKind, {TTI::OK_AnyValue, TTI::OP_None},
+  {TTI::OK_UniformConstantValue, TTI::OP_None});
+  Cost += thisT()->getArithmeticInstrCost(
+  Instruction::Shl, RetTy, CostKind, {TTI::OK_AnyValue, TTI::OP_None},
+  {TTI::OK_UniformConstantValue, TTI::OP_None});
+  Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, 
CostKind);
+  return Cost;
+}
 default:
   break;
 }

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] TTI: Check legalization cost of fptosi_sat/fptoui_sat nodes (PR #100521)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100521

>From 88fe51a9f4144094036da1899f5946ebfa609971 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 25 Jul 2024 10:33:23 +0400
Subject: [PATCH] TTI: Check legalization cost of fptosi_sat/fptoui_sat nodes

---
 llvm/include/llvm/CodeGen/BasicTTIImpl.h  |  56 +--
 llvm/test/Analysis/CostModel/X86/fptoi_sat.ll | 400 +-
 .../AggressiveInstCombine/ARM/fptosisat.ll|  49 ++-
 3 files changed, 266 insertions(+), 239 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 890c2b8ca36e1..8a14c8a37577a 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2183,31 +2183,11 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   ISD = ISD::UMULO;
   break;
 case Intrinsic::fptosi_sat:
-case Intrinsic::fptoui_sat: {
-  if (Tys.empty())
-break;
-  Type *FromTy = Tys[0];
-  bool IsSigned = IID == Intrinsic::fptosi_sat;
-
-  InstructionCost Cost = 0;
-  IntrinsicCostAttributes Attrs1(Intrinsic::minnum, FromTy,
- {FromTy, FromTy});
-  Cost += thisT()->getIntrinsicInstrCost(Attrs1, CostKind);
-  IntrinsicCostAttributes Attrs2(Intrinsic::maxnum, FromTy,
- {FromTy, FromTy});
-  Cost += thisT()->getIntrinsicInstrCost(Attrs2, CostKind);
-  Cost += thisT()->getCastInstrCost(
-  IsSigned ? Instruction::FPToSI : Instruction::FPToUI, RetTy, FromTy,
-  TTI::CastContextHint::None, CostKind);
-  if (IsSigned) {
-Type *CondTy = RetTy->getWithNewBitWidth(1);
-Cost += thisT()->getCmpSelInstrCost(
-BinaryOperator::FCmp, FromTy, CondTy, CmpInst::FCMP_UNO, CostKind);
-Cost += thisT()->getCmpSelInstrCost(
-BinaryOperator::Select, RetTy, CondTy, CmpInst::FCMP_UNO, 
CostKind);
-  }
-  return Cost;
-}
+  ISD = ISD::FP_TO_SINT_SAT;
+  break;
+case Intrinsic::fptoui_sat:
+  ISD = ISD::FP_TO_UINT_SAT;
+  break;
 case Intrinsic::ctpop:
   ISD = ISD::CTPOP;
   // In case of legalization use TCC_Expensive. This is cheaper than a
@@ -2422,6 +2402,32 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, 
CostKind);
   return Cost;
 }
+case Intrinsic::fptosi_sat:
+case Intrinsic::fptoui_sat: {
+  if (Tys.empty())
+break;
+  Type *FromTy = Tys[0];
+  bool IsSigned = IID == Intrinsic::fptosi_sat;
+
+  InstructionCost Cost = 0;
+  IntrinsicCostAttributes Attrs1(Intrinsic::minnum, FromTy,
+ {FromTy, FromTy});
+  Cost += thisT()->getIntrinsicInstrCost(Attrs1, CostKind);
+  IntrinsicCostAttributes Attrs2(Intrinsic::maxnum, FromTy,
+ {FromTy, FromTy});
+  Cost += thisT()->getIntrinsicInstrCost(Attrs2, CostKind);
+  Cost += thisT()->getCastInstrCost(
+  IsSigned ? Instruction::FPToSI : Instruction::FPToUI, RetTy, FromTy,
+  TTI::CastContextHint::None, CostKind);
+  if (IsSigned) {
+Type *CondTy = RetTy->getWithNewBitWidth(1);
+Cost += thisT()->getCmpSelInstrCost(
+BinaryOperator::FCmp, FromTy, CondTy, CmpInst::FCMP_UNO, CostKind);
+Cost += thisT()->getCmpSelInstrCost(
+BinaryOperator::Select, RetTy, CondTy, CmpInst::FCMP_UNO, 
CostKind);
+  }
+  return Cost;
+}
 default:
   break;
 }
diff --git a/llvm/test/Analysis/CostModel/X86/fptoi_sat.ll 
b/llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
index 3f5c79f2d59c6..55b80350f595e 100644
--- a/llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
+++ b/llvm/test/Analysis/CostModel/X86/fptoi_sat.ll
@@ -12,26 +12,26 @@
 
 define void @casts() {
 ; SSE2-LABEL: 'casts'
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: 
%f32s1 = call i1 @llvm.fptosi.sat.i1.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: 
%f32u1 = call i1 @llvm.fptoui.sat.i1.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: 
%f32s8 = call i8 @llvm.fptosi.sat.i8.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: 
%f32u8 = call i8 @llvm.fptoui.sat.i8.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: 
%f32s16 = call i16 @llvm.fptosi.sat.i16.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: 
%f32u16 = call i16 @llvm.fptoui.sat.i16.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: 
%f32s32 = call i32 @llvm.fptosi.sat.i32.f32(float undef)
-; SSE2-NEXT:  Cost Model: Found an estimated cost of 14 for instruction: 
%f32u32 =

[llvm-branch-commits] [llvm] TTI: Check legalization cost of abs nodes (PR #100523)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100523

>From bbd9d3db15809593b5d4c8d41f5990702843bf2e Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 25 Jul 2024 10:38:11 +0400
Subject: [PATCH] TTI: Check legalization cost of abs nodes

Also adjust the AMDGPU cost.
---
 llvm/include/llvm/CodeGen/BasicTTIImpl.h  |  32 +-
 .../AMDGPU/AMDGPUTargetTransformInfo.cpp  |   9 +-
 llvm/test/Analysis/CostModel/AMDGPU/abs.ll| 368 +-
 .../Analysis/CostModel/AMDGPU/arith-ssat.ll   |  32 +-
 .../Analysis/CostModel/AMDGPU/arith-usat.ll   |  32 +-
 5 files changed, 242 insertions(+), 231 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h 
b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 8a14c8a37577ad..c2bc1353ee8838 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -2120,20 +2120,9 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
 case Intrinsic::vector_reduce_fminimum:
   return 
thisT()->getMinMaxReductionCost(getMinMaxReductionIntrinsicOp(IID),
  VecOpTy, ICA.getFlags(), 
CostKind);
-case Intrinsic::abs: {
-  // abs(X) = select(icmp(X,0),X,sub(0,X))
-  Type *CondTy = RetTy->getWithNewBitWidth(1);
-  CmpInst::Predicate Pred = CmpInst::ICMP_SGT;
-  InstructionCost Cost = 0;
-  Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
-  Pred, CostKind);
-  Cost += thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, 
CondTy,
-  Pred, CostKind);
-  // TODO: Should we add an OperandValueProperties::OP_Zero property?
-  Cost += thisT()->getArithmeticInstrCost(
- BinaryOperator::Sub, RetTy, CostKind, {TTI::OK_UniformConstantValue, 
TTI::OP_None});
-  return Cost;
-}
+case Intrinsic::abs:
+  ISD = ISD::ABS;
+  break;
 case Intrinsic::smax:
   ISD = ISD::SMAX;
   break;
@@ -2402,6 +2391,21 @@ class BasicTTIImplBase : public 
TargetTransformInfoImplCRTPBase {
   Cost += thisT()->getArithmeticInstrCost(Instruction::Or, RetTy, 
CostKind);
   return Cost;
 }
+case Intrinsic::abs: {
+  // abs(X) = select(icmp(X,0),X,sub(0,X))
+  Type *CondTy = RetTy->getWithNewBitWidth(1);
+  CmpInst::Predicate Pred = CmpInst::ICMP_SGT;
+  InstructionCost Cost = 0;
+  Cost += thisT()->getCmpSelInstrCost(BinaryOperator::ICmp, RetTy, CondTy,
+  Pred, CostKind);
+  Cost += thisT()->getCmpSelInstrCost(BinaryOperator::Select, RetTy, 
CondTy,
+  Pred, CostKind);
+  // TODO: Should we add an OperandValueProperties::OP_Zero property?
+  Cost += thisT()->getArithmeticInstrCost(
+  BinaryOperator::Sub, RetTy, CostKind,
+  {TTI::OK_UniformConstantValue, TTI::OP_None});
+  return Cost;
+}
 case Intrinsic::fptosi_sat:
 case Intrinsic::fptoui_sat: {
   if (Tys.empty())
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
index 8d4ff64ac5adcf..c6aadf0b503012 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
@@ -696,6 +696,7 @@ static bool intrinsicHasPackedVectorBenefit(Intrinsic::ID 
ID) {
   case Intrinsic::usub_sat:
   case Intrinsic::sadd_sat:
   case Intrinsic::ssub_sat:
+  case Intrinsic::abs:
 return true;
   default:
 return false;
@@ -724,7 +725,7 @@ GCNTTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
   if (SLT == MVT::f64)
 return LT.first * NElts * get64BitInstrCost(CostKind);
 
-  if ((ST->has16BitInsts() && SLT == MVT::f16) ||
+  if ((ST->has16BitInsts() && (SLT == MVT::f16 || SLT == MVT::i16)) ||
   (ST->hasPackedFP32Ops() && SLT == MVT::f32))
 NElts = (NElts + 1) / 2;
 
@@ -752,11 +753,17 @@ GCNTTIImpl::getIntrinsicInstrCost(const 
IntrinsicCostAttributes &ICA,
   case Intrinsic::usub_sat:
   case Intrinsic::sadd_sat:
   case Intrinsic::ssub_sat: {
+// TODO: Full rate for i32/i16
 static const auto ValidSatTys = {MVT::v2i16, MVT::v4i16};
 if (any_of(ValidSatTys, [<](MVT M) { return M == LT.second; }))
   NElts = 1;
 break;
   }
+  case Intrinsic::abs:
+// Expansion takes 2 instructions for VALU
+if (SLT == MVT::i16 || SLT == MVT::i32)
+  InstRate = 2 * getFullRateInstrCost();
+break;
   default:
 break;
   }
diff --git a/llvm/test/Analysis/CostModel/AMDGPU/abs.ll 
b/llvm/test/Analysis/CostModel/AMDGPU/abs.ll
index f65615b07abc0f..b86e99558377bb 100644
--- a/llvm/test/Analysis/CostModel/AMDGPU/abs.ll
+++ b/llvm/test/Analysis/CostModel/AMDGPU/abs.ll
@@ -14,116 +14,116 @@ define void @abs_nonpoison() {
 ; FAST-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %I64 = 
call 

[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

tobias-stadler wrote:

> Is this a fundamental issue of the combiner or do we have to revisit the 
> situation when the combiner becomes slowly more powerful?

This is a fundamental issue of combiner-style algorithms. Fixed-point iteration 
just burns too much compile-time for no good reason. Both the InstCombiner and 
the DAGCombiner don't use fixed-point iteration anymore for the same 
compile-time reasons. The heuristics I have implemented should be mostly on-par 
with how thoroughly the other combiner implementations currently handle 
retrying combines. The main difference is that we can rely on the Observer to 
detect changes for retrying combines and the other combiner implementations 
need to use WorkList-aware functions for performing changes on the IR. If this 
approach is good enough for SelectionDAG, I don't see this approach becoming a 
problem for GlobalISel.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Tobias Stadler via llvm-branch-commits

https://github.com/tobias-stadler edited 
https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)

2024-08-06 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn milestoned 
https://github.com/llvm/llvm-project/pull/102201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)

2024-08-06 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn created 
https://github.com/llvm/llvm-project/pull/102201

…577)

Update getDependenceDistanceStrideAndSize to reason about different 
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a non-wrapping 
AddRec) or invariant, the accesses may overlap with earlier or later iterations 
and we cannot generate runtime checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we can 
generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces additional 
checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577

>From 83098f4513567a054663b30380e4f2039ee8a6d0 Mon Sep 17 00:00:00 2001
From: Florian Hahn 
Date: Fri, 26 Jul 2024 13:10:16 +0100
Subject: [PATCH] [LAA] Refine stride checks for SCEVs during dependence
 analysis. (#99577)

Update getDependenceDistanceStrideAndSize to reason about different
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a
non-wrapping AddRec) or invariant, the accesses may overlap
with earlier or later iterations and we cannot generate runtime
checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we
can generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces
additional checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577
---
 .../llvm/Analysis/LoopAccessAnalysis.h|  23 ++--
 llvm/lib/Analysis/LoopAccessAnalysis.cpp  | 121 --
 .../load-store-index-loaded-in-loop.ll|  26 ++--
 .../pointer-with-unknown-bounds.ll|   4 +-
 .../LoopAccessAnalysis/print-order.ll |   6 +-
 .../LoopAccessAnalysis/select-dependence.ll   |   4 +-
 .../LoopAccessAnalysis/symbolic-stride.ll |   4 +-
 7 files changed, 87 insertions(+), 101 deletions(-)

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h 
b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index afafb74bdcb0ac..95a74b91f7acbf 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -199,9 +199,8 @@ class MemoryDepChecker {
   /// Check whether the dependencies between the accesses are safe.
   ///
   /// Only checks sets with elements in \p CheckDeps.
-  bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
-   const DenseMap>
-   &UnderlyingObjects);
+  bool areDepsSafe(const DepCandidates &AccessSets,
+   const MemAccessInfoList &CheckDeps);
 
   /// No memory dependence was encountered that would inhibit
   /// vectorization.
@@ -351,11 +350,8 @@ class MemoryDepChecker {
   /// element access it records this distance in \p MinDepDistBytes (if this
   /// distance is smaller than any other distance encountered so far).
   /// Otherwise, this function returns true signaling a possible dependence.
-  Dependence::DepType
-  isDependent(const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B,
-  unsigned BIdx,
-  const DenseMap>
-  &UnderlyingObjects);
+  Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx,
+  const MemAccessInfo &B, unsigned BIdx);
 
   /// Check whether the data dependence could prevent store-load
   /// forwarding.
@@ -392,11 +388,9 @@ class MemoryDepChecker {
   /// determined, or a struct containing (Distance, Stride, TypeSize, AIsWrite,
   /// BIsWrite).
   std::variant
-  getDependenceDistanceStrideAndSize(
-  const MemAccessInfo &A, Instruction *AInst, const MemAccessInfo &B,
-  Instruction *BInst,
-  const DenseMap>
-  &UnderlyingObjects);
+  getDependenceDistanceStrideAndSize(const MemAccessInfo &A, Instruction 
*AInst,
+ const MemAccessInfo &B,
+ Instruction *BInst);
 };
 
 class RuntimePointerChecking;
@@ -797,7 +791,8 @@ replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
   Value *Ptr);
 
 /// If the pointer has a constant stride return it in units of the access type
-/// size.  Otherwise return std::nullopt.
+/// size. If the pointer is loop-invariant, return 0. Otherwise return
+/// std::nullopt.
 //

[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)

2024-08-06 Thread Florian Hahn via llvm-branch-commits

https://github.com/fhahn edited https://github.com/llvm/llvm-project/pull/102201
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [LAA] Refine stride checks for SCEVs during dependence analysis. (#99… (PR #102201)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)


Changes

…577)

Update getDependenceDistanceStrideAndSize to reason about different 
combinations of strides directly and explicitly.

Update getPtrStride to return 0 for invariant pointers.

Then proceed by checking the strides.

If either source or sink are not strided by a constant (i.e. not a non-wrapping 
AddRec) or invariant, the accesses may overlap with earlier or later iterations 
and we cannot generate runtime checks to disambiguate them.

Otherwise they are either loop invariant or strided. In that case, we can 
generate a runtime check to disambiguate them.

If both are strided by constants, we proceed as previously.

This is an alternative to
https://github.com/llvm/llvm-project/pull/99239 and also replaces additional 
checks if the underlying object is loop-invariant.

Fixes https://github.com/llvm/llvm-project/issues/87189.

PR: https://github.com/llvm/llvm-project/pull/99577

---

Patch is 20.07 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/102201.diff


7 Files Affected:

- (modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+9-14) 
- (modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+56-65) 
- (modified) 
llvm/test/Analysis/LoopAccessAnalysis/load-store-index-loaded-in-loop.ll 
(+12-14) 
- (modified) 
llvm/test/Analysis/LoopAccessAnalysis/pointer-with-unknown-bounds.ll (+2-2) 
- (modified) llvm/test/Analysis/LoopAccessAnalysis/print-order.ll (+4-2) 
- (modified) llvm/test/Analysis/LoopAccessAnalysis/select-dependence.ll (+2-2) 
- (modified) llvm/test/Analysis/LoopAccessAnalysis/symbolic-stride.ll (+2-2) 


``diff
diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h 
b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index afafb74bdcb0ac..95a74b91f7acbf 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -199,9 +199,8 @@ class MemoryDepChecker {
   /// Check whether the dependencies between the accesses are safe.
   ///
   /// Only checks sets with elements in \p CheckDeps.
-  bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
-   const DenseMap>
-   &UnderlyingObjects);
+  bool areDepsSafe(const DepCandidates &AccessSets,
+   const MemAccessInfoList &CheckDeps);
 
   /// No memory dependence was encountered that would inhibit
   /// vectorization.
@@ -351,11 +350,8 @@ class MemoryDepChecker {
   /// element access it records this distance in \p MinDepDistBytes (if this
   /// distance is smaller than any other distance encountered so far).
   /// Otherwise, this function returns true signaling a possible dependence.
-  Dependence::DepType
-  isDependent(const MemAccessInfo &A, unsigned AIdx, const MemAccessInfo &B,
-  unsigned BIdx,
-  const DenseMap>
-  &UnderlyingObjects);
+  Dependence::DepType isDependent(const MemAccessInfo &A, unsigned AIdx,
+  const MemAccessInfo &B, unsigned BIdx);
 
   /// Check whether the data dependence could prevent store-load
   /// forwarding.
@@ -392,11 +388,9 @@ class MemoryDepChecker {
   /// determined, or a struct containing (Distance, Stride, TypeSize, AIsWrite,
   /// BIsWrite).
   std::variant
-  getDependenceDistanceStrideAndSize(
-  const MemAccessInfo &A, Instruction *AInst, const MemAccessInfo &B,
-  Instruction *BInst,
-  const DenseMap>
-  &UnderlyingObjects);
+  getDependenceDistanceStrideAndSize(const MemAccessInfo &A, Instruction 
*AInst,
+ const MemAccessInfo &B,
+ Instruction *BInst);
 };
 
 class RuntimePointerChecking;
@@ -797,7 +791,8 @@ replaceSymbolicStrideSCEV(PredicatedScalarEvolution &PSE,
   Value *Ptr);
 
 /// If the pointer has a constant stride return it in units of the access type
-/// size.  Otherwise return std::nullopt.
+/// size. If the pointer is loop-invariant, return 0. Otherwise return
+/// std::nullopt.
 ///
 /// Ensure that it does not wrap in the address space, assuming the predicate
 /// associated with \p PSE is true.
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp 
b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 84214c47a10e11..f3fc69c86cd1e6 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -728,11 +728,6 @@ class AccessAnalysis {
 
   MemAccessInfoList &getDependenciesToCheck() { return CheckDeps; }
 
-  const DenseMap> &
-  getUnderlyingObjects() {
-return UnderlyingObjects;
-  }
-
 private:
   typedef MapVector> PtrAccessMap;
 
@@ -1459,22 +1454,23 @@ static bool isNoWrapAddRec(Value *Ptr, const 
SCEVAddRecExpr *AR,
 }
 
 /// Check whether the access through \p Ptr has a constant stride.
-std::optional llvm::getPtrStride(PredicatedScalarEvolution &PSE,
-

[llvm-branch-commits] [llvm] release/19.x: [CalcSpillWeights] Avoid x87 excess precision influencing weight result (PR #102207)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102207

Backport c80c09f3e380a0a2b00b36bebf72f43271a564c1

Requested by: @DimitryAndric

>From c472fe3ab850b114a4349c602f375ae0294f57c4 Mon Sep 17 00:00:00 2001
From: Dimitry Andric 
Date: Tue, 23 Jul 2024 19:02:36 +0200
Subject: [PATCH] [CalcSpillWeights] Avoid x87 excess precision influencing
 weight result

Fixes #99396

The result of `VirtRegAuxInfo::weightCalcHelper` can be influenced by
x87 excess precision, which can result in slightly different register
choices when the compiler is hosted on x86_64 or i386. This leads to
different object file output when cross-compiling to i386, or native.

Similar to 7af3432e22b0, we need to add a `volatile` qualifier to the
local `Weight` variable to force it onto the stack, and avoid the excess
precision. Define `stack_float_t` in `MathExtras.h` for this purpose,
and use it.

(cherry picked from commit c80c09f3e380a0a2b00b36bebf72f43271a564c1)
---
 llvm/include/llvm/Support/MathExtras.h |  8 
 llvm/lib/CodeGen/CalcSpillWeights.cpp  | 11 ++---
 llvm/test/CodeGen/X86/pr99396.ll   | 56 ++
 3 files changed, 70 insertions(+), 5 deletions(-)
 create mode 100644 llvm/test/CodeGen/X86/pr99396.ll

diff --git a/llvm/include/llvm/Support/MathExtras.h 
b/llvm/include/llvm/Support/MathExtras.h
index 0d0fa826f7bba..e568e42afcf4d 100644
--- a/llvm/include/llvm/Support/MathExtras.h
+++ b/llvm/include/llvm/Support/MathExtras.h
@@ -770,6 +770,14 @@ std::enable_if_t, T> MulOverflow(T X, 
T Y, T &Result) {
 #endif
 }
 
+/// Type to force float point values onto the stack, so that x86 doesn't add
+/// hidden precision, avoiding rounding differences on various platforms.
+#if defined(__i386__) || defined(_M_IX86)
+using stack_float_t = volatile float;
+#else
+using stack_float_t = float;
+#endif
+
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp 
b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 1d767a3484bca..9d8c9119f7719 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -22,6 +22,7 @@
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -257,7 +258,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
   return -1.0f;
 }
 
-float Weight = 1.0f;
+// Force Weight onto the stack so that x86 doesn't add hidden precision,
+// similar to HWeight below.
+stack_float_t Weight = 1.0f;
 if (IsSpillable) {
   // Get loop info for mi.
   if (MI->getParent() != MBB) {
@@ -284,11 +287,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
 Register HintReg = copyHint(MI, LI.reg(), TRI, MRI);
 if (!HintReg)
   continue;
-// Force hweight onto the stack so that x86 doesn't add hidden precision,
+// Force HWeight onto the stack so that x86 doesn't add hidden precision,
 // making the comparison incorrectly pass (i.e., 1 > 1 == true??).
-//
-// FIXME: we probably shouldn't use floats at all.
-volatile float HWeight = Hint[HintReg] += Weight;
+stack_float_t HWeight = Hint[HintReg] += Weight;
 if (HintReg.isVirtual() || MRI.isAllocatable(HintReg))
   CopyHints.insert(CopyHint(HintReg, HWeight));
   }
diff --git a/llvm/test/CodeGen/X86/pr99396.ll b/llvm/test/CodeGen/X86/pr99396.ll
new file mode 100644
index 0..f534d32038c22
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr99396.ll
@@ -0,0 +1,56 @@
+; RUN: llc < %s -mtriple=i386-unknown-freebsd -enable-misched 
-relocation-model=pic | FileCheck %s
+
+@c = external local_unnamed_addr global ptr
+
+declare i32 @fn2() local_unnamed_addr
+
+declare i32 @fn3() local_unnamed_addr
+
+define noundef i32 @fn4() #0 {
+entry:
+  %tmp0 = load i32, ptr @fn4, align 4
+; CHECK: movl fn4@GOT(%ebx), %edi
+; CHECK-NEXT: movl (%edi), %edx
+  %tmp1 = load ptr, ptr @c, align 4
+; CHECK: movl c@GOT(%ebx), %eax
+; CHECK-NEXT: movl (%eax), %esi
+; CHECK-NEXT: testl %esi, %esi
+  %cmp.g = icmp eq ptr %tmp1, null
+  br i1 %cmp.g, label %if.then.g, label %if.end3.g
+
+if.then.g:; preds = %entry
+  %tmp2 = load i32, ptr inttoptr (i32 1 to ptr), align 4
+  %cmp1.g = icmp slt i32 %tmp2, 0
+  br i1 %cmp1.g, label %if.then2.g, label %if.end3.g
+
+if.then2.g:   ; preds = %if.then.g
+  %.g = load volatile i32, ptr null, align 2147483648
+  br label %f.exit
+
+if.end3.g:; preds = %if.then.g, %entry
+  %h.i.g = icmp eq i32 %tmp0, 0
+  br i1 %h.i.g, label %f.exit, label %while.body.g
+
+while.body.g: ; preds = %if.end3.g, 
%if.end8.g
+  %buff.addr.019.g = phi ptr [ %incdec.ptr.g, %if.end8.g ], [ @fn4, %if.end3.g 
]
+  %g.addr.018.g = phi i32 [ %dec.g

[llvm-branch-commits] [llvm] release/19.x: [CalcSpillWeights] Avoid x87 excess precision influencing weight result (PR #102207)

2024-08-06 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102207
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [CalcSpillWeights] Avoid x87 excess precision influencing weight result (PR #102207)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-x86

Author: None (llvmbot)


Changes

Backport c80c09f3e380a0a2b00b36bebf72f43271a564c1

Requested by: @DimitryAndric

---
Full diff: https://github.com/llvm/llvm-project/pull/102207.diff


3 Files Affected:

- (modified) llvm/include/llvm/Support/MathExtras.h (+8) 
- (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+6-5) 
- (added) llvm/test/CodeGen/X86/pr99396.ll (+56) 


``diff
diff --git a/llvm/include/llvm/Support/MathExtras.h 
b/llvm/include/llvm/Support/MathExtras.h
index 0d0fa826f7bba..e568e42afcf4d 100644
--- a/llvm/include/llvm/Support/MathExtras.h
+++ b/llvm/include/llvm/Support/MathExtras.h
@@ -770,6 +770,14 @@ std::enable_if_t, T> MulOverflow(T X, 
T Y, T &Result) {
 #endif
 }
 
+/// Type to force float point values onto the stack, so that x86 doesn't add
+/// hidden precision, avoiding rounding differences on various platforms.
+#if defined(__i386__) || defined(_M_IX86)
+using stack_float_t = volatile float;
+#else
+using stack_float_t = float;
+#endif
+
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp 
b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 1d767a3484bca..9d8c9119f7719 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -22,6 +22,7 @@
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -257,7 +258,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
   return -1.0f;
 }
 
-float Weight = 1.0f;
+// Force Weight onto the stack so that x86 doesn't add hidden precision,
+// similar to HWeight below.
+stack_float_t Weight = 1.0f;
 if (IsSpillable) {
   // Get loop info for mi.
   if (MI->getParent() != MBB) {
@@ -284,11 +287,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
 Register HintReg = copyHint(MI, LI.reg(), TRI, MRI);
 if (!HintReg)
   continue;
-// Force hweight onto the stack so that x86 doesn't add hidden precision,
+// Force HWeight onto the stack so that x86 doesn't add hidden precision,
 // making the comparison incorrectly pass (i.e., 1 > 1 == true??).
-//
-// FIXME: we probably shouldn't use floats at all.
-volatile float HWeight = Hint[HintReg] += Weight;
+stack_float_t HWeight = Hint[HintReg] += Weight;
 if (HintReg.isVirtual() || MRI.isAllocatable(HintReg))
   CopyHints.insert(CopyHint(HintReg, HWeight));
   }
diff --git a/llvm/test/CodeGen/X86/pr99396.ll b/llvm/test/CodeGen/X86/pr99396.ll
new file mode 100644
index 0..f534d32038c22
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr99396.ll
@@ -0,0 +1,56 @@
+; RUN: llc < %s -mtriple=i386-unknown-freebsd -enable-misched 
-relocation-model=pic | FileCheck %s
+
+@c = external local_unnamed_addr global ptr
+
+declare i32 @fn2() local_unnamed_addr
+
+declare i32 @fn3() local_unnamed_addr
+
+define noundef i32 @fn4() #0 {
+entry:
+  %tmp0 = load i32, ptr @fn4, align 4
+; CHECK: movl fn4@GOT(%ebx), %edi
+; CHECK-NEXT: movl (%edi), %edx
+  %tmp1 = load ptr, ptr @c, align 4
+; CHECK: movl c@GOT(%ebx), %eax
+; CHECK-NEXT: movl (%eax), %esi
+; CHECK-NEXT: testl %esi, %esi
+  %cmp.g = icmp eq ptr %tmp1, null
+  br i1 %cmp.g, label %if.then.g, label %if.end3.g
+
+if.then.g:; preds = %entry
+  %tmp2 = load i32, ptr inttoptr (i32 1 to ptr), align 4
+  %cmp1.g = icmp slt i32 %tmp2, 0
+  br i1 %cmp1.g, label %if.then2.g, label %if.end3.g
+
+if.then2.g:   ; preds = %if.then.g
+  %.g = load volatile i32, ptr null, align 2147483648
+  br label %f.exit
+
+if.end3.g:; preds = %if.then.g, %entry
+  %h.i.g = icmp eq i32 %tmp0, 0
+  br i1 %h.i.g, label %f.exit, label %while.body.g
+
+while.body.g: ; preds = %if.end3.g, 
%if.end8.g
+  %buff.addr.019.g = phi ptr [ %incdec.ptr.g, %if.end8.g ], [ @fn4, %if.end3.g 
]
+  %g.addr.018.g = phi i32 [ %dec.g, %if.end8.g ], [ %tmp0, %if.end3.g ]
+  %call4.g = tail call i32 @fn3(ptr %tmp1, ptr %buff.addr.019.g, i32 
%g.addr.018.g)
+  %cmp5.g = icmp slt i32 %call4.g, 0
+  br i1 %cmp5.g, label %if.then6.g, label %if.end8.g
+
+if.then6.g:   ; preds = %while.body.g
+  %call7.g = tail call i32 @fn2(ptr null)
+  br label %f.exit
+
+if.end8.g:; preds = %while.body.g
+  %dec.g = add i32 %g.addr.018.g, 1
+  %incdec.ptr.g = getelementptr i32, ptr %buff.addr.019.g, i32 1
+  store i64 0, ptr %tmp1, align 4
+  %h.not.g = icmp eq i32 %dec.g, 0
+  br i1 %h.not.g, label %f.exit, label %while.body.g
+
+f.exit:   ; preds = %if.end8.g, 
%if.then6.g, %if.end3.g, %if.then2.g
+  ret i32 0
+}
+
+attributes #0 = { "frame-p

[llvm-branch-commits] [llvm] release/19.x: [CalcSpillWeights] Avoid x87 excess precision influencing weight result (PR #102207)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-regalloc

Author: None (llvmbot)


Changes

Backport c80c09f3e380a0a2b00b36bebf72f43271a564c1

Requested by: @DimitryAndric

---
Full diff: https://github.com/llvm/llvm-project/pull/102207.diff


3 Files Affected:

- (modified) llvm/include/llvm/Support/MathExtras.h (+8) 
- (modified) llvm/lib/CodeGen/CalcSpillWeights.cpp (+6-5) 
- (added) llvm/test/CodeGen/X86/pr99396.ll (+56) 


``diff
diff --git a/llvm/include/llvm/Support/MathExtras.h 
b/llvm/include/llvm/Support/MathExtras.h
index 0d0fa826f7bba..e568e42afcf4d 100644
--- a/llvm/include/llvm/Support/MathExtras.h
+++ b/llvm/include/llvm/Support/MathExtras.h
@@ -770,6 +770,14 @@ std::enable_if_t, T> MulOverflow(T X, 
T Y, T &Result) {
 #endif
 }
 
+/// Type to force float point values onto the stack, so that x86 doesn't add
+/// hidden precision, avoiding rounding differences on various platforms.
+#if defined(__i386__) || defined(_M_IX86)
+using stack_float_t = volatile float;
+#else
+using stack_float_t = float;
+#endif
+
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp 
b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 1d767a3484bca..9d8c9119f7719 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -22,6 +22,7 @@
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/CodeGen/VirtRegMap.h"
 #include "llvm/Support/Debug.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 #include 
@@ -257,7 +258,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
   return -1.0f;
 }
 
-float Weight = 1.0f;
+// Force Weight onto the stack so that x86 doesn't add hidden precision,
+// similar to HWeight below.
+stack_float_t Weight = 1.0f;
 if (IsSpillable) {
   // Get loop info for mi.
   if (MI->getParent() != MBB) {
@@ -284,11 +287,9 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, 
SlotIndex *Start,
 Register HintReg = copyHint(MI, LI.reg(), TRI, MRI);
 if (!HintReg)
   continue;
-// Force hweight onto the stack so that x86 doesn't add hidden precision,
+// Force HWeight onto the stack so that x86 doesn't add hidden precision,
 // making the comparison incorrectly pass (i.e., 1 > 1 == true??).
-//
-// FIXME: we probably shouldn't use floats at all.
-volatile float HWeight = Hint[HintReg] += Weight;
+stack_float_t HWeight = Hint[HintReg] += Weight;
 if (HintReg.isVirtual() || MRI.isAllocatable(HintReg))
   CopyHints.insert(CopyHint(HintReg, HWeight));
   }
diff --git a/llvm/test/CodeGen/X86/pr99396.ll b/llvm/test/CodeGen/X86/pr99396.ll
new file mode 100644
index 0..f534d32038c22
--- /dev/null
+++ b/llvm/test/CodeGen/X86/pr99396.ll
@@ -0,0 +1,56 @@
+; RUN: llc < %s -mtriple=i386-unknown-freebsd -enable-misched 
-relocation-model=pic | FileCheck %s
+
+@c = external local_unnamed_addr global ptr
+
+declare i32 @fn2() local_unnamed_addr
+
+declare i32 @fn3() local_unnamed_addr
+
+define noundef i32 @fn4() #0 {
+entry:
+  %tmp0 = load i32, ptr @fn4, align 4
+; CHECK: movl fn4@GOT(%ebx), %edi
+; CHECK-NEXT: movl (%edi), %edx
+  %tmp1 = load ptr, ptr @c, align 4
+; CHECK: movl c@GOT(%ebx), %eax
+; CHECK-NEXT: movl (%eax), %esi
+; CHECK-NEXT: testl %esi, %esi
+  %cmp.g = icmp eq ptr %tmp1, null
+  br i1 %cmp.g, label %if.then.g, label %if.end3.g
+
+if.then.g:; preds = %entry
+  %tmp2 = load i32, ptr inttoptr (i32 1 to ptr), align 4
+  %cmp1.g = icmp slt i32 %tmp2, 0
+  br i1 %cmp1.g, label %if.then2.g, label %if.end3.g
+
+if.then2.g:   ; preds = %if.then.g
+  %.g = load volatile i32, ptr null, align 2147483648
+  br label %f.exit
+
+if.end3.g:; preds = %if.then.g, %entry
+  %h.i.g = icmp eq i32 %tmp0, 0
+  br i1 %h.i.g, label %f.exit, label %while.body.g
+
+while.body.g: ; preds = %if.end3.g, 
%if.end8.g
+  %buff.addr.019.g = phi ptr [ %incdec.ptr.g, %if.end8.g ], [ @fn4, %if.end3.g 
]
+  %g.addr.018.g = phi i32 [ %dec.g, %if.end8.g ], [ %tmp0, %if.end3.g ]
+  %call4.g = tail call i32 @fn3(ptr %tmp1, ptr %buff.addr.019.g, i32 
%g.addr.018.g)
+  %cmp5.g = icmp slt i32 %call4.g, 0
+  br i1 %cmp5.g, label %if.then6.g, label %if.end8.g
+
+if.then6.g:   ; preds = %while.body.g
+  %call7.g = tail call i32 @fn2(ptr null)
+  br label %f.exit
+
+if.end8.g:; preds = %while.body.g
+  %dec.g = add i32 %g.addr.018.g, 1
+  %incdec.ptr.g = getelementptr i32, ptr %buff.addr.019.g, i32 1
+  store i64 0, ptr %tmp1, align 4
+  %h.not.g = icmp eq i32 %dec.g, 0
+  br i1 %h.not.g, label %f.exit, label %while.body.g
+
+f.exit:   ; preds = %if.end8.g, 
%if.then6.g, %if.end3.g, %if.then2.g
+  ret i32 0
+}
+
+attributes #0 = { "frame

[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

LGTM with nit.

https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits


@@ -45,61 +45,189 @@ cl::OptionCategory GICombinerOptionCategory(
 );
 } // end namespace llvm
 
-/// This class acts as the glue the joins the CombinerHelper to the overall
+/// This class acts as the glue that joins the CombinerHelper to the overall
 /// Combine algorithm. The CombinerHelper is intended to report the
 /// modifications it makes to the MIR to the GISelChangeObserver and the
-/// observer subclass will act on these events. In this case, instruction
-/// erasure will cancel any future visits to the erased instruction and
-/// instruction creation will schedule that instruction for a future visit.
-/// Other Combiner implementations may require more complex behaviour from
-/// their GISelChangeObserver subclass.
+/// observer subclass will act on these events.
 class Combiner::WorkListMaintainer : public GISelChangeObserver {
-  using WorkListTy = GISelWorkList<512>;
-  WorkListTy &WorkList;
+protected:
+#ifndef NDEBUG
   /// The instructions that have been created but we want to report once they
   /// have their operands. This is only maintained if debug output is 
requested.
-#ifndef NDEBUG
-  SetVector CreatedInstrs;
+  SmallSetVector CreatedInstrs;
 #endif
+  using Level = CombinerInfo::ObserverLevel;
 
 public:
-  WorkListMaintainer(WorkListTy &WorkList) : WorkList(WorkList) {}
+  static std::unique_ptr
+  create(Level Lvl, WorkListTy &WorkList, MachineRegisterInfo &MRI);
+
   virtual ~WorkListMaintainer() = default;
 
+  void reportFullyCreatedInstrs() {
+LLVM_DEBUG({
+  for (auto *MI : CreatedInstrs) {
+dbgs() << "Created: " << *MI;
+  }
+  CreatedInstrs.clear();
+});
+  }
+
+  virtual void reset() = 0;
+  virtual void appliedCombine() = 0;
+};
+
+/// A configurable WorkListMaintainer implementation.
+/// The ObserverLevel determines how the WorkListMaintainer reacts to MIR
+/// changes.
+template 
+class Combiner::WorkListMaintainerImpl : public Combiner::WorkListMaintainer {
+  WorkListTy &WorkList;
+  MachineRegisterInfo &MRI;
+
+  // Defer handling these instructions until the combine finishes.
+  SmallSetVector DeferList;
+
+  // Track VRegs that (might) have lost a use.
+  SmallSetVector LostUses;
+
+public:
+  WorkListMaintainerImpl(WorkListTy &WorkList, MachineRegisterInfo &MRI)
+  : WorkList(WorkList), MRI(MRI) {}
+
+  virtual ~WorkListMaintainerImpl() = default;
+
+  void reset() override {
+DeferList.clear();
+LostUses.clear();
+  }
+
   void erasingInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Erasing: " << MI << "\n");
+// MI will become dangling, remove it from all lists.
+LLVM_DEBUG(dbgs() << "Erasing: " << MI; CreatedInstrs.remove(&MI));
 WorkList.remove(&MI);
+if constexpr (Lvl != Level::Basic) {
+  DeferList.remove(&MI);
+  noteLostUses(MI);
+}
   }
+
   void createdInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Creating: " << MI << "\n");
-WorkList.insert(&MI);
-LLVM_DEBUG(CreatedInstrs.insert(&MI));
+LLVM_DEBUG(dbgs() << "Creating: " << MI; CreatedInstrs.insert(&MI));
+if constexpr (Lvl == Level::Basic)
+  WorkList.insert(&MI);
+else
+  // Defer handling newly created instructions, because they don't have
+  // operands yet. We also insert them into the WorkList in reverse
+  // order so that they will be combined top down.
+  DeferList.insert(&MI);
   }
+
   void changingInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Changing: " << MI << "\n");
-WorkList.insert(&MI);
+LLVM_DEBUG(dbgs() << "Changing: " << MI);
+// Some uses might get dropped when MI is changed.
+// For now, overapproximate by assuming all uses will be dropped.
+// TODO: Is a more precise heuristic or manual tracking of use count
+// decrements worth it?
+if constexpr (Lvl != Level::Basic)
+  noteLostUses(MI);
   }
+
   void changedInstr(MachineInstr &MI) override {
-LLVM_DEBUG(dbgs() << "Changed: " << MI << "\n");
-WorkList.insert(&MI);
+LLVM_DEBUG(dbgs() << "Changed: " << MI);
+if constexpr (Lvl == Level::Basic)
+  WorkList.insert(&MI);
+else
+  // Defer this for DCE
+  DeferList.insert(&MI);
   }
 
-  void reportFullyCreatedInstrs() {
-LLVM_DEBUG(for (const auto *MI
-: CreatedInstrs) {
-  dbgs() << "Created: ";
-  MI->print(dbgs());
-});
-LLVM_DEBUG(CreatedInstrs.clear());
+  // Only track changes during the combine and then walk the def/use-chains 
once
+  // the combine is finished, because:
+  // - instructions might have multiple defs during the combine.
+  // - use counts aren't accurate during the combine.
+  void appliedCombine() override {
+if constexpr (Lvl == Level::Basic)
+  return;
+
+// DCE deferred instructions and add them to the WorkList bottom up.
+while (!DeferList.empty()) {
+  MachineInstr &MI = *DeferList.pop_back_val();
+  if (tryDCE(MI, MRI))
+continue;
+
+  if const

[llvm-branch-commits] [GlobalISel] Combiner: Observer-based DCE and retrying of combines (PR #102163)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson edited 
https://github.com/llvm/llvm-project/pull/102163
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [AArch64][GlobalISel] Disable fixed-point iteration in all Combiners (PR #102167)

2024-08-06 Thread Amara Emerson via llvm-branch-commits

https://github.com/aemerson approved this pull request.

These are some very nice improvements, thanks for working on this. None of the 
test output changes look to be exposing problems with this patch, so LGTM.

https://github.com/llvm/llvm-project/pull/102167
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [mlir] b102575 - Revert "[mlir][linalg] Relax tensor.extract vectorization (#99299)"

2024-08-06 Thread via llvm-branch-commits

Author: Han-Chung Wang
Date: 2024-08-06T14:28:37-07:00
New Revision: b102575a6cf350124a8967a4e0714718008f72c1

URL: 
https://github.com/llvm/llvm-project/commit/b102575a6cf350124a8967a4e0714718008f72c1
DIFF: 
https://github.com/llvm/llvm-project/commit/b102575a6cf350124a8967a4e0714718008f72c1.diff

LOG: Revert "[mlir][linalg] Relax tensor.extract vectorization (#99299)"

This reverts commit 8868c02cda875d1efe1646affa01656ef268ffed.

Added: 


Modified: 
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir

Removed: 




diff  --git a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp 
b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
index 6da886f5ec19e..3d0d6abf702d7 100644
--- a/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
+++ b/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
@@ -946,22 +946,27 @@ getTensorExtractMemoryAccessPattern(tensor::ExtractOp 
extractOp,
   if (linalgOp.hasDynamicShape())
 return VectorMemoryAccessKind::Gather;
 
-  // True for vectors that are effectively 1D, e.g. `vector<1x4x1xi32>`, false
-  // otherwise.
-  bool isOutput1DVector = (llvm::count_if(targetShape, [](int64_t dimSize) {
- return dimSize > 1;
-   }) == 1);
-
-  // 1. Assume that it's a gather load when reading non-1D vector.
-  if (!isOutput1DVector)
+  // 1. Assume that it's a gather load when reading _into_:
+  //* an n-D "vector", like `tensor<1x2x4xi32` or `tensor<2x1x4xi32>`, or
+  //* a 1-D "vector" with the trailing dim equal 1, e.g. 
`tensor<1x4x1xi32`.
+  // TODO: Relax these conditions.
+  // FIXME: This condition assumes non-dynamic sizes.
+  if ((llvm::count_if(targetShape,
+  [](int64_t dimSize) { return dimSize > 1; }) != 1) ||
+  targetShape.back() == 1)
+return VectorMemoryAccessKind::Gather;
+
+  // 2. Assume that it's a gather load when reading _from_ a tensor for which
+  // the trailing dimension is 1, e.g. `tensor<1x4x1xi32>`.
+  // TODO: Relax this condition.
+  if (inputShape.getShape().back() == 1)
 return VectorMemoryAccessKind::Gather;
 
   bool leadingIdxsLoopInvariant = true;
 
-  // 2. Analyze the leading indices of `extractOp`.
+  // 3. Analyze the leading indices of `extractOp`.
   // Look at the way each index is calculated and decide whether it is suitable
-  // for a contiguous load, i.e. whether it's loop invariant. If not, it's a
-  // gather load.
+  // for a contiguous load, i.e. whether it's loop invariant.
   auto indices = extractOp.getIndices();
   auto leadIndices = indices.drop_back(1);
 
@@ -977,13 +982,13 @@ getTensorExtractMemoryAccessPattern(tensor::ExtractOp 
extractOp,
 return VectorMemoryAccessKind::Gather;
   }
 
-  // 3. Analyze the trailing index for `extractOp`.
+  // 4. Analyze the trailing index for `extractOp`.
   // At this point we know that the leading indices are loop invariant. This
   // means that is potentially a scalar or a contiguous load. We can decide
   // based on the trailing idx.
   auto extractOpTrailingIdx = indices.back();
 
-  // 3a. Scalar broadcast load
+  // 4a. Scalar broadcast load
   // If the trailing index is loop invariant then this is a scalar load.
   if (leadingIdxsLoopInvariant &&
   isLoopInvariantIdx(linalgOp, extractOpTrailingIdx)) {
@@ -992,7 +997,7 @@ getTensorExtractMemoryAccessPattern(tensor::ExtractOp 
extractOp,
 return VectorMemoryAccessKind::ScalarBroadcast;
   }
 
-  // 3b. Contiguous loads
+  // 4b. Contiguous loads
   // The trailing `extractOp` index should increment with every loop iteration.
   // This effectively means that it must be based on the trailing loop index.
   // This is what the following bool captures.
@@ -1006,7 +1011,7 @@ getTensorExtractMemoryAccessPattern(tensor::ExtractOp 
extractOp,
 return VectorMemoryAccessKind::Contiguous;
   }
 
-  // 4. Fallback case - gather load.
+  // 5. Fallback case - gather load.
   LDBG("Found gather load: " << extractOp);
   return VectorMemoryAccessKind::Gather;
 }

diff  --git a/mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir 
b/mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir
index ac75a19cbeb28..85e1c56dd45a0 100644
--- a/mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir
+++ b/mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir
@@ -595,59 +595,3 @@ module attributes {transform.with_named_sequence} {
  transform.yield
}
 }
-
-
-// -
-
-func.func @vectorize_scalar_broadcast_column_tensor(%in: tensor<1x1x4xi32>) -> 
tensor<1x1x4xi32> {
-  %c4 = arith.constant 4 : index
-  %c0 = arith.constant 0 : index
-  %cst = arith.constant dense<[[0], [1], [2], [3], [4], [5], [6], [7], [8], 
[9], [10], [11], [12], [13], [14]]> : tensor<15x1xi32>
-
-  %out = linalg.generic {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, 
d2)>], iterator_types = ["parallel",

[llvm-branch-commits] [NFC] [sanitizers] leave BufferedStackTrace uninit in tests (PR #102251)

2024-08-06 Thread Florian Mayer via llvm-branch-commits

https://github.com/fmayer created 
https://github.com/llvm/llvm-project/pull/102251

This is for consistency with the production code



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [DFSan] [compiler-rt] leave BufferedStackTrace uninit (PR #102252)

2024-08-06 Thread Florian Mayer via llvm-branch-commits

https://github.com/fmayer created 
https://github.com/llvm/llvm-project/pull/102252

Otherwise we have to memset 2040 bytes (255 * 8) for each call



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [compiler-rt] [UBSan] leave BufferedStackTrace uninit (PR #102253)

2024-08-06 Thread Florian Mayer via llvm-branch-commits

https://github.com/fmayer created 
https://github.com/llvm/llvm-project/pull/102253

Otherwise we have to memset 2040 bytes (255 * 8) for each call



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [NFC] [sanitizers] leave BufferedStackTrace uninit in tests (PR #102251)

2024-08-06 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Florian Mayer (fmayer)


Changes

This is for consistency with the production code


---
Full diff: https://github.com/llvm/llvm-project/pull/102251.diff


2 Files Affected:

- (modified) compiler-rt/lib/asan/tests/asan_noinst_test.cpp (+4-4) 
- (modified) 
compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp (+1-1) 


``diff
diff --git a/compiler-rt/lib/asan/tests/asan_noinst_test.cpp 
b/compiler-rt/lib/asan/tests/asan_noinst_test.cpp
index df7de2d7d15ed..ba1eab1a90f44 100644
--- a/compiler-rt/lib/asan/tests/asan_noinst_test.cpp
+++ b/compiler-rt/lib/asan/tests/asan_noinst_test.cpp
@@ -137,7 +137,7 @@ TEST(AddressSanitizer, DISABLED_InternalPrintShadow) {
 }
 
 TEST(AddressSanitizer, QuarantineTest) {
-  BufferedStackTrace stack;
+  UNINITIALIZED BufferedStackTrace stack;
   stack.trace_buffer[0] = 0x890;
   stack.size = 1;
 
@@ -159,7 +159,7 @@ TEST(AddressSanitizer, QuarantineTest) {
 void *ThreadedQuarantineTestWorker(void *unused) {
   (void)unused;
   u32 seed = my_rand();
-  BufferedStackTrace stack;
+  UNINITIALIZED BufferedStackTrace stack;
   stack.trace_buffer[0] = 0x890;
   stack.size = 1;
 
@@ -194,7 +194,7 @@ TEST(AddressSanitizer, ThreadedQuarantineTest) {
 
 void *ThreadedOneSizeMallocStress(void *unused) {
   (void)unused;
-  BufferedStackTrace stack;
+  UNINITIALIZED BufferedStackTrace stack;
   stack.trace_buffer[0] = 0x890;
   stack.size = 1;
   const size_t kNumMallocs = 1000;
@@ -238,7 +238,7 @@ static void TestLoadStoreCallbacks(CB cb[2][5]) {
   uptr buggy_ptr;
 
   __asan_test_only_reported_buggy_pointer = &buggy_ptr;
-  BufferedStackTrace stack;
+  UNINITIALIZED BufferedStackTrace stack;
   stack.trace_buffer[0] = 0x890;
   stack.size = 1;
 
diff --git 
a/compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp 
b/compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp
index 11ca1fd7f0517..cf42294a4b0c3 100644
--- a/compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp
+++ b/compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp
@@ -259,7 +259,7 @@ TEST_F(StackPrintDeathTest, 
SKIP_ON_SPARC(RequiresNonNullBuffer)) {
 #endif // SANITIZER_CAN_FAST_UNWIND
 
 TEST(SlowUnwindTest, ShortStackTrace) {
-  BufferedStackTrace stack;
+  UNINITIALIZED BufferedStackTrace stack;
   uptr pc = StackTrace::GetCurrentPc();
   uptr bp = GET_CURRENT_FRAME();
   stack.Unwind(pc, bp, nullptr, false, /*max_depth=*/0);

``




https://github.com/llvm/llvm-project/pull/102251
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 30b2408 - Revert "[Attributor] Fix an issue that an access is skipped by mistake (#101862)"

2024-08-06 Thread via llvm-branch-commits

Author: Shilei Tian
Date: 2024-08-06T21:30:14-04:00
New Revision: 30b2408fe12faf6ac90a1a709750425ffd752a8c

URL: 
https://github.com/llvm/llvm-project/commit/30b2408fe12faf6ac90a1a709750425ffd752a8c
DIFF: 
https://github.com/llvm/llvm-project/commit/30b2408fe12faf6ac90a1a709750425ffd752a8c.diff

LOG: Revert "[Attributor] Fix an issue that an access is skipped by mistake 
(#101862)"

This reverts commit 53d33d3ba5eedac8fccb9d36576cd667800a4a38.

Added: 


Modified: 
llvm/lib/Transforms/IPO/AttributorAttributes.cpp

Removed: 
llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll



diff  --git a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp 
b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
index db5e94806e9a16..77026c6aa5b272 100644
--- a/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
+++ b/llvm/lib/Transforms/IPO/AttributorAttributes.cpp
@@ -1325,20 +1325,20 @@ struct AAPointerInfoImpl
 
 const auto *FnReachabilityAA = A.getAAFor(
 QueryingAA, IRPosition::function(Scope), DepClassTy::OPTIONAL);
-if (FnReachabilityAA) {
-  // Without going backwards in the call tree, can we reach the access
-  // from the least dominating write. Do not allow to pass the
-  // instruction itself either.
-  bool Inserted = ExclusionSet.insert(&I).second;
-
-  if (!FnReachabilityAA->instructionCanReach(
-  A, *LeastDominatingWriteInst,
-  *Acc.getRemoteInst()->getFunction(), &ExclusionSet))
-WriteChecked = true;
-
-  if (Inserted)
-ExclusionSet.erase(&I);
-}
+
+// Without going backwards in the call tree, can we reach the access
+// from the least dominating write. Do not allow to pass the 
instruction
+// itself either.
+bool Inserted = ExclusionSet.insert(&I).second;
+
+if (!FnReachabilityAA ||
+!FnReachabilityAA->instructionCanReach(
+A, *LeastDominatingWriteInst,
+*Acc.getRemoteInst()->getFunction(), &ExclusionSet))
+  WriteChecked = true;
+
+if (Inserted)
+  ExclusionSet.erase(&I);
   }
 
   if (ReadChecked && WriteChecked)

diff  --git a/llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll 
b/llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll
deleted file mode 100644
index f419d89a7f0a44..00
--- a/llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll
+++ /dev/null
@@ -1,73 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --function-signature --check-globals
-; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -passes=amdgpu-attributor %s -o - | 
FileCheck %s
-
-@g_fn = addrspace(1) global ptr null
-
-;.
-; CHECK: @g_fn = addrspace(1) global ptr null
-;.
-define void @set_fn(ptr %fn) {
-; CHECK-LABEL: define {{[^@]+}}@set_fn
-; CHECK-SAME: (ptr [[FN:%.*]]) #[[ATTR0:[0-9]+]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:store ptr [[FN]], ptr addrspace(1) @g_fn, align 8
-; CHECK-NEXT:ret void
-;
-entry:
-  store ptr %fn, ptr addrspace(1) @g_fn
-  ret void
-}
-
-define void @get_fn(ptr %fn) {
-; CHECK-LABEL: define {{[^@]+}}@get_fn
-; CHECK-SAME: (ptr [[FN:%.*]]) #[[ATTR0]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:[[LOAD:%.*]] = load ptr, ptr addrspace(1) @g_fn, align 8
-; CHECK-NEXT:store ptr [[LOAD]], ptr [[FN]], align 8
-; CHECK-NEXT:ret void
-;
-entry:
-  %load = load ptr, ptr addrspace(1) @g_fn
-  store ptr %load, ptr %fn
-  ret void
-}
-
-define void @foo() {
-; CHECK-LABEL: define {{[^@]+}}@foo
-; CHECK-SAME: () #[[ATTR1:[0-9]+]] {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:[[FN:%.*]] = alloca ptr, align 8, addrspace(5)
-; CHECK-NEXT:store ptr null, ptr addrspace(5) [[FN]], align 8
-; CHECK-NEXT:[[FN_CAST:%.*]] = addrspacecast ptr addrspace(5) [[FN]] to ptr
-; CHECK-NEXT:call void @get_fn(ptr [[FN_CAST]])
-; CHECK-NEXT:[[LOAD:%.*]] = load ptr, ptr addrspace(5) [[FN]], align 8
-; CHECK-NEXT:[[TOBOOL:%.*]] = icmp ne ptr [[LOAD]], null
-; CHECK-NEXT:br i1 [[TOBOOL]], label [[IF_THEN:%.*]], label [[IF_END:%.*]]
-; CHECK:   if.then:
-; CHECK-NEXT:[[LOAD_1:%.*]] = load ptr, ptr addrspace(5) [[FN]], align 8
-; CHECK-NEXT:call void [[LOAD_1]]()
-; CHECK-NEXT:br label [[IF_END]]
-; CHECK:   if.end:
-; CHECK-NEXT:ret void
-;
-entry:
-  %fn = alloca ptr, addrspace(5)
-  store ptr null, ptr addrspace(5) %fn
-  %fn.cast = addrspacecast ptr addrspace(5) %fn to ptr
-  call void @get_fn(ptr %fn.cast)
-  %load = load ptr, ptr addrspace(5) %fn
-  %tobool = icmp ne ptr %load, null
-  br i1 %tobool, label %if.then, label %if.end
-
-if.then:
-  %load.1 = load ptr, ptr addrspace(5) %fn
-  call void %load.1()
-  br label %if.end
-
-if.end:
-  ret void
-}
-;.
-; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-agpr" 
"amdgpu-no-completion-action"

[llvm-branch-commits] [clang] release/19.x: [clang][driver][clang-cl] Support `--precompile` and `-fmodule-*` options in Clang-CL (#98761) (PR #102159)

2024-08-06 Thread Chuanqi Xu via llvm-branch-commits

ChuanqiXu9 wrote:

I feel this is good and no risks to bring in 19.x

https://github.com/llvm/llvm-project/pull/102159
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-06 Thread Shilei Tian via llvm-branch-commits

https://github.com/shiltian updated 
https://github.com/llvm/llvm-project/pull/101760

>From f9e990a43908efc2e155c95f3cd4ddadefc4d6a1 Mon Sep 17 00:00:00 2001
From: Shilei Tian 
Date: Fri, 2 Aug 2024 18:05:44 -0400
Subject: [PATCH] [AMDGPU][Attributor] Add a pass parameter `closed-world` for
 AMDGPUAttributor pass

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   | 11 +--
 llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp   | 11 ---
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def | 12 +++-
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 29 +--
 .../CodeGen/AMDGPU/simple-indirect-call-2.ll  |  2 +-
 .../Other/amdgpu-pass-pipeline-parsing.ll | 12 
 6 files changed, 63 insertions(+), 14 deletions(-)
 create mode 100644 llvm/test/Other/amdgpu-pass-pipeline-parsing.ll

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 50aef36724f705..d8ed1d9db00e59 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -283,17 +283,22 @@ class AMDGPULowerKernelArgumentsPass
   PreservedAnalyses run(Function &, FunctionAnalysisManager &);
 };
 
+struct AMDGPUAttributorOptions {
+  bool IsClosedWorld = false;
+};
+
 class AMDGPUAttributorPass : public PassInfoMixin {
 private:
   TargetMachine &TM;
 
+  AMDGPUAttributorOptions Options;
+
   /// Asserts whether we can assume whole program visibility.
   bool HasWholeProgramVisibility = false;
 
 public:
-  AMDGPUAttributorPass(TargetMachine &TM,
-   bool HasWholeProgramVisibility = false)
-  : TM(TM), HasWholeProgramVisibility(HasWholeProgramVisibility) {};
+  AMDGPUAttributorPass(TargetMachine &TM, AMDGPUAttributorOptions Options = {})
+  : TM(TM), Options(Options) {};
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 };
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 9557005721cb15..d65e0ae92308e6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1025,7 +1025,7 @@ static void addPreloadKernArgHint(Function &F, 
TargetMachine &TM) {
 }
 
 static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
-bool HasWholeProgramVisibility) {
+AMDGPUAttributorOptions Options) {
   SetVector Functions;
   for (Function &F : M) {
 if (!F.isIntrinsic())
@@ -1044,7 +1044,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, 
TargetMachine &TM,
&AAInstanceInfo::ID});
 
   AttributorConfig AC(CGUpdater);
-  AC.IsClosedWorldModule = HasWholeProgramVisibility;
+  AC.IsClosedWorldModule = Options.IsClosedWorld;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
@@ -1114,7 +1114,7 @@ class AMDGPUAttributorLegacy : public ModulePass {
 
   bool runOnModule(Module &M) override {
 AnalysisGetter AG(this);
-return runImpl(M, AG, *TM, /*HasWholeProgramVisibility=*/false);
+return runImpl(M, AG, *TM, /*Options=*/{});
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
@@ -1135,9 +1135,8 @@ PreservedAnalyses llvm::AMDGPUAttributorPass::run(Module 
&M,
   AnalysisGetter AG(FAM);
 
   // TODO: Probably preserves CFG
-  return runImpl(M, AG, TM, HasWholeProgramVisibility)
- ? PreservedAnalyses::none()
- : PreservedAnalyses::all();
+  return runImpl(M, AG, TM, Options) ? PreservedAnalyses::none()
+ : PreservedAnalyses::all();
 }
 
 char AMDGPUAttributorLegacy::ID = 0;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 57fc3314dd9709..0adf11d27a2f54 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -17,7 +17,6 @@
 #define MODULE_PASS(NAME, CREATE_PASS)
 #endif
 MODULE_PASS("amdgpu-always-inline", AMDGPUAlwaysInlinePass())
-MODULE_PASS("amdgpu-attributor", AMDGPUAttributorPass(*this))
 MODULE_PASS("amdgpu-lower-buffer-fat-pointers",
 AMDGPULowerBufferFatPointersPass(*this))
 MODULE_PASS("amdgpu-lower-ctor-dtor", AMDGPUCtorDtorLoweringPass())
@@ -26,6 +25,17 @@ MODULE_PASS("amdgpu-printf-runtime-binding", 
AMDGPUPrintfRuntimeBindingPass())
 MODULE_PASS("amdgpu-unify-metadata", AMDGPUUnifyMetadataPass())
 #undef MODULE_PASS
 
+#ifndef MODULE_PASS_WITH_PARAMS
+#define MODULE_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
+#endif
+MODULE_PASS_WITH_PARAMS(
+"amdgpu-attributor", "AMDGPUAttributorPass",
+[=](AMDGPUAttributorOptions Options) {
+  return AMDGPUAttributorPass(*this, Options);
+},
+parseAMDGPUAttributorPassOptions, "closed-world")
+#undef MODULE_PASS_WITH_PARAMS
+
 #ifndef FUNCTION_PASS
 #define FUNCTION_PASS(NAME, CREATE_PASS)
 #endif
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 50cc2d871d4ece..700408cd55e6

[llvm-branch-commits] [llvm] [AMDGPU][Attributor] Add a pass parameter `closed-world` for AMDGPUAttributor pass (PR #101760)

2024-08-06 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/101760
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-06 Thread David Green via llvm-branch-commits

davemgreen wrote:

The changes looks OK to me - they should only alter the (doublely) opt-in 
optimization remarks so should be pretty safe. @tru does that sounds OK to you, 
for it to go onto the branch?

https://github.com/llvm/llvm-project/pull/102168
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits