date:20240812

[llvm-branch-commits] [libcxx] release/19.x: [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338) (PR #102466)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


https://github.com/tru updated https://github.com/llvm/llvm-project/pull/102466

>From f6381989f454d3f819847272b0d3bd84d6785aef Mon Sep 17 00:00:00 2001
From: Xing Xue 
Date: Thu, 8 Aug 2024 09:15:51 -0400
Subject: [PATCH] [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp
 (#102338)

Remove `XFAIL: LIBCXX-AIX-FIXME` from lit test `transform.pass.cpp` now
that AIX system call `wcsxfrm`/`wcsxfrm_l` is fixed in AIX 7.2.5.8 and
7.3.2.2 and buildbot machines have been upgraded.

Backported from commit cb5912a71061c6558bd4293596dcacc1ce0ca2f6
---
 libcxx/test/std/re/re.traits/transform.pass.cpp | 1 -
 1 file changed, 1 deletion(-)

diff --git a/libcxx/test/std/re/re.traits/transform.pass.cpp 
b/libcxx/test/std/re/re.traits/transform.pass.cpp
index 369dbdf7053ba1..80cd3f01faff2f 100644
--- a/libcxx/test/std/re/re.traits/transform.pass.cpp
+++ b/libcxx/test/std/re/re.traits/transform.pass.cpp
@@ -8,7 +8,6 @@
 //
 // NetBSD does not support LC_COLLATE at the moment
 // XFAIL: netbsd
-// XFAIL: LIBCXX-AIX-FIXME
 
 // REQUIRES: locale.cs_CZ.ISO8859-2
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] f638198 - [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


Author: Xing Xue
Date: 2024-08-12T10:39:47+02:00
New Revision: f6381989f454d3f819847272b0d3bd84d6785aef

URL: 
https://github.com/llvm/llvm-project/commit/f6381989f454d3f819847272b0d3bd84d6785aef
DIFF: 
https://github.com/llvm/llvm-project/commit/f6381989f454d3f819847272b0d3bd84d6785aef.diff

LOG: [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338)

Remove `XFAIL: LIBCXX-AIX-FIXME` from lit test `transform.pass.cpp` now
that AIX system call `wcsxfrm`/`wcsxfrm_l` is fixed in AIX 7.2.5.8 and
7.3.2.2 and buildbot machines have been upgraded.

Backported from commit cb5912a71061c6558bd4293596dcacc1ce0ca2f6

Added: 


Modified: 
libcxx/test/std/re/re.traits/transform.pass.cpp

Removed: 




diff  --git a/libcxx/test/std/re/re.traits/transform.pass.cpp 
b/libcxx/test/std/re/re.traits/transform.pass.cpp
index 369dbdf7053ba1..80cd3f01faff2f 100644
--- a/libcxx/test/std/re/re.traits/transform.pass.cpp
+++ b/libcxx/test/std/re/re.traits/transform.pass.cpp
@@ -8,7 +8,6 @@
 //
 // NetBSD does not support LC_COLLATE at the moment
 // XFAIL: netbsd
-// XFAIL: LIBCXX-AIX-FIXME
 
 // REQUIRES: locale.cs_CZ.ISO8859-2
 



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/19.x: [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338) (PR #102466)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


https://github.com/tru closed https://github.com/llvm/llvm-project/pull/102466
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755) (PR #102771)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


https://github.com/tru updated https://github.com/llvm/llvm-project/pull/102771

>From 4fd6b324ea2d44418832e7f90b2e5ffc1972c900 Mon Sep 17 00:00:00 2001
From: Rainer Orth 
Date: Sat, 10 Aug 2024 22:54:07 +0200
Subject: [PATCH] [llvm-exegesis][unittests] Also disable SubprocessMemoryTest
 on SPARC (#102755)

Three `llvm-exegesis` tests
```
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition
```
`FAIL` on Linux/sparc64 like
```
llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure
Expected equality of these values:
  SharedMemoryMapping[I]
Which is: '\0'
  ExpectedValue[I]
Which is: '\xAA' (170)
```
It seems like this test only works on little-endian hosts: three
sub-tests are already disabled on powerpc and s390x (both big-endian),
and the fourth is additionally guarded against big-endian hosts (making
the other guards unnecessary).

However, since it's not been analyzed if this is really an endianess
issue, this patch disables the whole test on powerpc and s390x as before
adding sparc to the mix.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit a417083e27b155dc92b7f7271c0093aee0d7231c)
---
 .../X86/SubprocessMemoryTest.cpp  | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp 
b/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
index 7c23e7b7e9c5a5..f61254ac74e140 100644
--- a/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
+++ b/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
@@ -24,7 +24,8 @@
 namespace llvm {
 namespace exegesis {
 
-#if defined(__linux__) && !defined(__ANDROID__)
+#if defined(__linux__) && !defined(__ANDROID__) && 
\
+!(defined(__powerpc__) || defined(__s390x__) || defined(__sparc__))
 
 // This needs to be updated anytime a test is added or removed from the test
 // suite.
@@ -77,20 +78,12 @@ class SubprocessMemoryTest : public X86TestBase {
 // memory calls not working in some cases, so they have been disabled.
 // TODO(boomanaiden154): Investigate and fix this issue on PPC.
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_OneDefinition) {
-#else
 TEST_F(SubprocessMemoryTest, OneDefinition) {
-#endif
   testCommon({{"test1", {APInt(8, 0xff), 4096, 0}}}, 0);
   checkSharedMemoryDefinition(getSharedMemoryName(0, 0), 4096, {0xff});
 }
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_MultipleDefinitions) {
-#else
 TEST_F(SubprocessMemoryTest, MultipleDefinitions) {
-#endif
   testCommon({{"test1", {APInt(8, 0xaa), 4096, 0}},
   {"test2", {APInt(8, 0xbb), 4096, 1}},
   {"test3", {APInt(8, 0xcc), 4096, 2}}},
@@ -100,11 +93,7 @@ TEST_F(SubprocessMemoryTest, MultipleDefinitions) {
   checkSharedMemoryDefinition(getSharedMemoryName(1, 2), 4096, {0xcc});
 }
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_DefinitionFillsCompletely) {
-#else
 TEST_F(SubprocessMemoryTest, DefinitionFillsCompletely) {
-#endif
   testCommon({{"test1", {APInt(8, 0xaa), 4096, 0}},
   {"test2", {APInt(16, 0x), 4096, 1}},
   {"test3", {APInt(24, 0xcc), 4096, 2}}},
@@ -118,7 +107,7 @@ TEST_F(SubprocessMemoryTest, DefinitionFillsCompletely) {
 }
 
 // The following test is only supported on little endian systems.
-#if defined(__powerpc__) || defined(__s390x__) || __BYTE_ORDER__ == 
__ORDER_BIG_ENDIAN__
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
 TEST_F(SubprocessMemoryTest, DISABLED_DefinitionEndTruncation) {
 #else
 TEST_F(SubprocessMemoryTest, DefinitionEndTruncation) {
@@ -150,7 +139,7 @@ TEST_F(SubprocessMemoryTest, DefinitionEndTruncation) {
   checkSharedMemoryDefinition(getSharedMemoryName(3, 0), 4096, Test1Expected);
 }
 
-#endif // defined(__linux__) && !defined(__ANDROID__)
+#endif // __linux__ && !__ANDROID__ && !(__powerpc__ || __s390x__ || __sparc__)
 
 } // namespace exegesis
 } // namespace llvm

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] 4fd6b32 - [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


Author: Rainer Orth
Date: 2024-08-12T10:40:07+02:00
New Revision: 4fd6b324ea2d44418832e7f90b2e5ffc1972c900

URL: 
https://github.com/llvm/llvm-project/commit/4fd6b324ea2d44418832e7f90b2e5ffc1972c900
DIFF: 
https://github.com/llvm/llvm-project/commit/4fd6b324ea2d44418832e7f90b2e5ffc1972c900.diff

LOG: [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC 
(#102755)

Three `llvm-exegesis` tests
```
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/DefinitionFillsCompletely
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/MultipleDefinitions
  LLVM-Unit :: 
tools/llvm-exegesis/./LLVMExegesisTests/SubprocessMemoryTest/OneDefinition
```
`FAIL` on Linux/sparc64 like
```
llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp:68: Failure
Expected equality of these values:
  SharedMemoryMapping[I]
Which is: '\0'
  ExpectedValue[I]
Which is: '\xAA' (170)
```
It seems like this test only works on little-endian hosts: three
sub-tests are already disabled on powerpc and s390x (both big-endian),
and the fourth is additionally guarded against big-endian hosts (making
the other guards unnecessary).

However, since it's not been analyzed if this is really an endianess
issue, this patch disables the whole test on powerpc and s390x as before
adding sparc to the mix.

Tested on `sparc64-unknown-linux-gnu` and `x86_64-pc-linux-gnu`.

(cherry picked from commit a417083e27b155dc92b7f7271c0093aee0d7231c)

Added: 


Modified: 
llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp

Removed: 




diff  --git a/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp 
b/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
index 7c23e7b7e9c5a5..f61254ac74e140 100644
--- a/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
+++ b/llvm/unittests/tools/llvm-exegesis/X86/SubprocessMemoryTest.cpp
@@ -24,7 +24,8 @@
 namespace llvm {
 namespace exegesis {
 
-#if defined(__linux__) && !defined(__ANDROID__)
+#if defined(__linux__) && !defined(__ANDROID__) && 
\
+!(defined(__powerpc__) || defined(__s390x__) || defined(__sparc__))
 
 // This needs to be updated anytime a test is added or removed from the test
 // suite.
@@ -77,20 +78,12 @@ class SubprocessMemoryTest : public X86TestBase {
 // memory calls not working in some cases, so they have been disabled.
 // TODO(boomanaiden154): Investigate and fix this issue on PPC.
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_OneDefinition) {
-#else
 TEST_F(SubprocessMemoryTest, OneDefinition) {
-#endif
   testCommon({{"test1", {APInt(8, 0xff), 4096, 0}}}, 0);
   checkSharedMemoryDefinition(getSharedMemoryName(0, 0), 4096, {0xff});
 }
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_MultipleDefinitions) {
-#else
 TEST_F(SubprocessMemoryTest, MultipleDefinitions) {
-#endif
   testCommon({{"test1", {APInt(8, 0xaa), 4096, 0}},
   {"test2", {APInt(8, 0xbb), 4096, 1}},
   {"test3", {APInt(8, 0xcc), 4096, 2}}},
@@ -100,11 +93,7 @@ TEST_F(SubprocessMemoryTest, MultipleDefinitions) {
   checkSharedMemoryDefinition(getSharedMemoryName(1, 2), 4096, {0xcc});
 }
 
-#if defined(__powerpc__) || defined(__s390x__)
-TEST_F(SubprocessMemoryTest, DISABLED_DefinitionFillsCompletely) {
-#else
 TEST_F(SubprocessMemoryTest, DefinitionFillsCompletely) {
-#endif
   testCommon({{"test1", {APInt(8, 0xaa), 4096, 0}},
   {"test2", {APInt(16, 0x), 4096, 1}},
   {"test3", {APInt(24, 0xcc), 4096, 2}}},
@@ -118,7 +107,7 @@ TEST_F(SubprocessMemoryTest, DefinitionFillsCompletely) {
 }
 
 // The following test is only supported on little endian systems.
-#if defined(__powerpc__) || defined(__s390x__) || __BYTE_ORDER__ == 
__ORDER_BIG_ENDIAN__
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
 TEST_F(SubprocessMemoryTest, DISABLED_DefinitionEndTruncation) {
 #else
 TEST_F(SubprocessMemoryTest, DefinitionEndTruncation) {
@@ -150,7 +139,7 @@ TEST_F(SubprocessMemoryTest, DefinitionEndTruncation) {
   checkSharedMemoryDefinition(getSharedMemoryName(3, 0), 4096, Test1Expected);
 }
 
-#endif // defined(__linux__) && !defined(__ANDROID__)
+#endif // __linux__ && !__ANDROID__ && !(__powerpc__ || __s390x__ || __sparc__)
 
 } // namespace exegesis
 } // namespace llvm



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755) (PR #102771)

2024-08-12 Thread Tobias Hieta via llvm-branch-commits


https://github.com/tru closed https://github.com/llvm/llvm-project/pull/102771
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libcxx] release/19.x: [NFC][libc++][test][AIX] UnXFAIL LIT test transform.pass.cpp (#102338) (PR #102466)

2024-08-12 Thread via llvm-branch-commits


github-actions[bot] wrote:

@xingxue-ibm (or anyone else). If you would like to add a note about this fix 
in the release notes (completely optional). Please reply to this comment with a 
one or two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/102466
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [llvm-exegesis][unittests] Also disable SubprocessMemoryTest on SPARC (#102755) (PR #102771)

2024-08-12 Thread via llvm-branch-commits


github-actions[bot] wrote:

@rorth (or anyone else). If you would like to add a note about this fix in the 
release notes (completely optional). Please reply to this comment with a one or 
two sentence description of the fix.  When you are done, please add the 
release:note label to this PR. 

https://github.com/llvm/llvm-project/pull/102771
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/102865

This will allow them to be shared between the old PM and new PM files.

I don't really like needing to expose these globally like this; maybe
it would be better to just move TargetPassConfig and the CodeGenPassBuilder
into one common file?

>From 414bc713e2177417d654ba57c2965571308a9239 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 12:46:00 +0400
Subject: [PATCH] AMDGPU: Declare pass control flags in header

This will allow them to be shared between the old PM and new PM files.

I don't really like needing to expose these globally like this; maybe
it would be better to just move TargetPassConfig and the CodeGenPassBuilder
into one common file?
---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 203 --
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |  41 
 2 files changed, 133 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index cad4585c5b3013..3409a49fe203f9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -74,6 +74,7 @@
 
 using namespace llvm;
 using namespace llvm::PatternMatch;
+using namespace llvm::AMDGPU;
 
 namespace {
 class SGPRRegisterRegAlloc : public RegisterRegAllocBase 
{
@@ -186,109 +187,95 @@ static VGPRRegisterRegAlloc fastRegAllocVGPR(
   "fast", "fast register allocator", createFastVGPRRegisterAllocator);
 } // anonymous namespace
 
-static cl::opt
-EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
-cl::desc("Run early if-conversion"),
-cl::init(false));
+namespace llvm::AMDGPU {
+cl::opt EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
+  cl::desc("Run early if-conversion"),
+  cl::init(false));
 
-static cl::opt
-OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
-cl::desc("Run pre-RA exec mask optimizations"),
-cl::init(true));
+cl::opt OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
+   cl::desc("Run pre-RA exec mask optimizations"),
+   cl::init(true));
 
-static cl::opt
+cl::opt
 LowerCtorDtor("amdgpu-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
   cl::init(true), cl::Hidden);
 
 // Option to disable vectorizer for tests.
-static cl::opt EnableLoadStoreVectorizer(
-  "amdgpu-load-store-vectorizer",
-  cl::desc("Enable load store vectorizer"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt
+EnableLoadStoreVectorizer("amdgpu-load-store-vectorizer",
+  cl::desc("Enable load store vectorizer"),
+  cl::init(true), cl::Hidden);
 
 // Option to control global loads scalarization
-static cl::opt ScalarizeGlobal(
-  "amdgpu-scalarize-global-loads",
-  cl::desc("Enable global load scalarization"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt ScalarizeGlobal("amdgpu-scalarize-global-loads",
+  cl::desc("Enable global load scalarization"),
+  cl::init(true), cl::Hidden);
 
 // Option to run internalize pass.
-static cl::opt InternalizeSymbols(
-  "amdgpu-internalize-symbols",
-  cl::desc("Enable elimination of non-kernel functions and unused globals"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt InternalizeSymbols(
+"amdgpu-internalize-symbols",
+cl::desc("Enable elimination of non-kernel functions and unused globals"),
+cl::init(false), cl::Hidden);
 
 // Option to inline all early.
-static cl::opt EarlyInlineAll(
-  "amdgpu-early-inline-all",
-  cl::desc("Inline all functions early"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt EarlyInlineAll("amdgpu-early-inline-all",
+ cl::desc("Inline all functions early"),
+ cl::init(false), cl::Hidden);
 
-static cl::opt RemoveIncompatibleFunctions(
+cl::opt RemoveIncompatibleFunctions(
 "amdgpu-enable-remove-incompatible-functions", cl::Hidden,
 cl::desc("Enable removal of functions when they"
  "use features not supported by the target GPU"),
 cl::init(true));
 
-static cl::opt EnableSDWAPeephole(
-  "amdgpu-sdwa-peephole",
-  cl::desc("Enable SDWA peepholer"),
-  cl::init(true));
+cl::opt EnableSDWAPeephole("amdgpu-sdwa-peephole",
+ cl::desc("Enable SDWA peepholer"),
+ cl::init(true));
 
-static cl::opt EnableDPPCombine(
-  "amdgpu-dpp-combine",
-  cl::desc("Enable DPP combiner"),
-  cl::init(true));
+cl::opt EnableDPPCombine("amdgpu-dpp-combine",
+   cl::desc("Enable DPP combiner"), 
cl::init(true));
 
 // Enable address space based alias analysis
-static

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/102865?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#102865** https://app.graphite.dev/github/pr/llvm/llvm-project/102865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#102816** https://app.graphite.dev/github/pr/llvm/llvm-project/102816?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102815** https://app.graphite.dev/github/pr/llvm/llvm-project/102815?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102814** https://app.graphite.dev/github/pr/llvm/llvm-project/102814?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102812** https://app.graphite.dev/github/pr/llvm/llvm-project/102812?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102806** https://app.graphite.dev/github/pr/llvm/llvm-project/102806?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102805** https://app.graphite.dev/github/pr/llvm/llvm-project/102805?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102645** https://app.graphite.dev/github/pr/llvm/llvm-project/102645?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102644** https://app.graphite.dev/github/pr/llvm/llvm-project/102644?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/102865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

This will allow them to be shared between the old PM and new PM files.

I don't really like needing to expose these globally like this; maybe
it would be better to just move TargetPassConfig and the CodeGenPassBuilder
into one common file?

---
Full diff: https://github.com/llvm/llvm-project/pull/102865.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+92-111) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h (+41) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index cad4585c5b3013..3409a49fe203f9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -74,6 +74,7 @@
 
 using namespace llvm;
 using namespace llvm::PatternMatch;
+using namespace llvm::AMDGPU;
 
 namespace {
 class SGPRRegisterRegAlloc : public RegisterRegAllocBase 
{
@@ -186,109 +187,95 @@ static VGPRRegisterRegAlloc fastRegAllocVGPR(
   "fast", "fast register allocator", createFastVGPRRegisterAllocator);
 } // anonymous namespace
 
-static cl::opt
-EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
-cl::desc("Run early if-conversion"),
-cl::init(false));
+namespace llvm::AMDGPU {
+cl::opt EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
+  cl::desc("Run early if-conversion"),
+  cl::init(false));
 
-static cl::opt
-OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
-cl::desc("Run pre-RA exec mask optimizations"),
-cl::init(true));
+cl::opt OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
+   cl::desc("Run pre-RA exec mask optimizations"),
+   cl::init(true));
 
-static cl::opt
+cl::opt
 LowerCtorDtor("amdgpu-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
   cl::init(true), cl::Hidden);
 
 // Option to disable vectorizer for tests.
-static cl::opt EnableLoadStoreVectorizer(
-  "amdgpu-load-store-vectorizer",
-  cl::desc("Enable load store vectorizer"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt
+EnableLoadStoreVectorizer("amdgpu-load-store-vectorizer",
+  cl::desc("Enable load store vectorizer"),
+  cl::init(true), cl::Hidden);
 
 // Option to control global loads scalarization
-static cl::opt ScalarizeGlobal(
-  "amdgpu-scalarize-global-loads",
-  cl::desc("Enable global load scalarization"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt ScalarizeGlobal("amdgpu-scalarize-global-loads",
+  cl::desc("Enable global load scalarization"),
+  cl::init(true), cl::Hidden);
 
 // Option to run internalize pass.
-static cl::opt InternalizeSymbols(
-  "amdgpu-internalize-symbols",
-  cl::desc("Enable elimination of non-kernel functions and unused globals"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt InternalizeSymbols(
+"amdgpu-internalize-symbols",
+cl::desc("Enable elimination of non-kernel functions and unused globals"),
+cl::init(false), cl::Hidden);
 
 // Option to inline all early.
-static cl::opt EarlyInlineAll(
-  "amdgpu-early-inline-all",
-  cl::desc("Inline all functions early"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt EarlyInlineAll("amdgpu-early-inline-all",
+ cl::desc("Inline all functions early"),
+ cl::init(false), cl::Hidden);
 
-static cl::opt RemoveIncompatibleFunctions(
+cl::opt RemoveIncompatibleFunctions(
 "amdgpu-enable-remove-incompatible-functions", cl::Hidden,
 cl::desc("Enable removal of functions when they"
  "use features not supported by the target GPU"),
 cl::init(true));
 
-static cl::opt EnableSDWAPeephole(
-  "amdgpu-sdwa-peephole",
-  cl::desc("Enable SDWA peepholer"),
-  cl::init(true));
+cl::opt EnableSDWAPeephole("amdgpu-sdwa-peephole",
+ cl::desc("Enable SDWA peepholer"),
+ cl::init(true));
 
-static cl::opt EnableDPPCombine(
-  "amdgpu-dpp-combine",
-  cl::desc("Enable DPP combiner"),
-  cl::init(true));
+cl::opt EnableDPPCombine("amdgpu-dpp-combine",
+   cl::desc("Enable DPP combiner"), 
cl::init(true));
 
 // Enable address space based alias analysis
-static cl::opt EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
-  cl::desc("Enable AMDGPU Alias Analysis"),
-  cl::init(true));
+cl::opt
+EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
+  cl::desc("Enable AMDGPU Alias Analysis"),
+  cl::init(true));
 
 //

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/102865
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] IR/AMDGPU: Autoupgrade amdgpu-unsafe-fp-atomics attribute (PR #101698)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/101698

error: too big or took too long to generate
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics attribute (PR #101699)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/101699

>From 66a2fe24e7ee4754d720cd7b12060bf44f981e38 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 26 Jun 2024 14:13:31 +0200
Subject: [PATCH] AMDGPU: Stop handling legacy amdgpu-unsafe-fp-atomics
 attribute

This is now autoupgraded to annotate atomicrmw instructions in
old bitcode.
---
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |   9 +-
 .../AtomicExpand/AMDGPU/expand-atomic-mmra.ll |   7 +-
 .../AMDGPU/expand-atomic-rmw-fadd.ll  | 169 +-
 .../AMDGPU/inline-amdgpu-unsafe-fp-atomics.ll |  99 --
 4 files changed, 88 insertions(+), 196 deletions(-)
 delete mode 100644 
llvm/test/Transforms/Inline/AMDGPU/inline-amdgpu-unsafe-fp-atomics.ll

diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 1cf9fb7a3724b7..5bcf158f4b62a4 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16165,14 +16165,7 @@ static bool globalMemoryFPAtomicIsLegal(const 
GCNSubtarget &Subtarget,
   } else if (Subtarget.supportsAgentScopeFineGrainedRemoteMemoryAtomics())
 return true;
 
-  if (RMW->hasMetadata("amdgpu.no.fine.grained.memory"))
-return true;
-
-  // TODO: Auto-upgrade this attribute to the metadata in function body and 
stop
-  // checking it.
-  return RMW->getFunction()
-  ->getFnAttribute("amdgpu-unsafe-fp-atomics")
-  .getValueAsBool();
+  return RMW->hasMetadata("amdgpu.no.fine.grained.memory");
 }
 
 /// \return Action to perform on AtomicRMWInsts for integer operations.
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll 
b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
index 78969839efcb8a..f6a07d9d923c35 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-mmra.ll
@@ -124,7 +124,7 @@ define i16 @test_cmpxchg_i16_global_agent_align4(ptr 
addrspace(1) %out, i16 %in,
   ret i16 %extract
 }
 
-define void @syncscope_workgroup_nortn(ptr %addr, float %val) #0 {
+define void @syncscope_workgroup_nortn(ptr %addr, float %val) {
 ; GFX90A-LABEL: define void @syncscope_workgroup_nortn(
 ; GFX90A-SAME: ptr [[ADDR:%.*]], float [[VAL:%.*]]) #[[ATTR1:[0-9]+]] {
 ; GFX90A-NEXT:[[IS_SHARED:%.*]] = call i1 @llvm.amdgcn.is.shared(ptr 
[[ADDR]])
@@ -157,7 +157,7 @@ define void @syncscope_workgroup_nortn(ptr %addr, float 
%val) #0 {
 ; GFX1100-NEXT:[[RES:%.*]] = atomicrmw fadd ptr [[ADDR]], float [[VAL]] 
syncscope("workgroup") seq_cst, align 4, !mmra [[META0]]
 ; GFX1100-NEXT:ret void
 ;
-  %res = atomicrmw fadd ptr %addr, float %val syncscope("workgroup") seq_cst, 
!mmra !2
+  %res = atomicrmw fadd ptr %addr, float %val syncscope("workgroup") seq_cst, 
!mmra !2, !amdgpu.no.fine.grained.memory !3, !amdgpu.ignore.denormal.mode !3
   ret void
 }
 
@@ -186,11 +186,10 @@ define i32 @atomic_load_global_align1(ptr addrspace(1) 
%ptr) {
   ret i32 %val
 }
 
-attributes #0 = { "amdgpu-unsafe-fp-atomics"="true" }
-
 !0 = !{!"foo", !"bar"}
 !1 = !{!"bux", !"baz"}
 !2 = !{!0, !1}
+!3 = !{}
 ;.
 ; GFX90A: [[META0]] = !{[[META1:![0-9]+]], [[META2:![0-9]+]]}
 ; GFX90A: [[META1]] = !{!"foo", !"bar"}
diff --git a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll 
b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
index 0a091bd0fc9ada..888e3cd29b9a32 100644
--- a/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
+++ b/llvm/test/Transforms/AtomicExpand/AMDGPU/expand-atomic-rmw-fadd.ll
@@ -273,7 +273,7 @@ define void 
@test_atomicrmw_fadd_f32_as999_no_use_unsafe(ptr addrspace(999) %ptr
   ret void
 }
 
-define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, 
float %value) #0 {
+define float @test_atomicrmw_fadd_f32_global_unsafe(ptr addrspace(1) %ptr, 
float %value) #3 {
 ; CI-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
 ; CI-NEXT:[[TMP1:%.*]] = load float, ptr addrspace(1) [[PTR:%.*]], align 4
 ; CI-NEXT:br label [[ATOMICRMW_START:%.*]]
@@ -323,22 +323,22 @@ define float @test_atomicrmw_fadd_f32_global_unsafe(ptr 
addrspace(1) %ptr, float
 ; GFX908-NEXT:ret float [[TMP5]]
 ;
 ; GFX90A-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
-; GFX90A-NEXT:[[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], 
float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4
+; GFX90A-NEXT:[[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], 
float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4, 
!amdgpu.no.fine.grained.memory [[META0:![0-9]+]]
 ; GFX90A-NEXT:ret float [[RES]]
 ;
 ; GFX940-LABEL: @test_atomicrmw_fadd_f32_global_unsafe(
-; GFX940-NEXT:[[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], 
float [[VALUE:%.*]] syncscope("wavefront") monotonic, align 4
+; GFX940-NEXT:[[RES:%.*]] = atomicrmw fadd ptr addrspace(1) [[PTR:%.*]], 
float [[VALUE:%.*]]

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/102867

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.

>From 4dc191a92ac747288627bc86f0a36bea22430e07 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 13:09:55 +0400
Subject: [PATCH] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.
---
 .../AMDGPU/AMDGPUCodeGenPassBuilder.cpp   | 38 +++
 .../Target/AMDGPU/AMDGPUCodeGenPassBuilder.h  |  7 
 2 files changed, 45 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 252a70d44736dc..9fd7e24b114ddd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -21,8 +21,10 @@
 #include "llvm/Transforms/Utils/LCSSA.h"
 #include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 
 using namespace llvm;
+using namespace llvm::AMDGPU;
 
 AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 GCNTargetMachine &TM, const CGPassBuilderOption &Opts,
@@ -37,8 +39,35 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
+  // deleted soon.
+
+  if (EnableLowerKernelArguments)
+addPass(AMDGPULowerKernelArgumentsPass(TM));
+
+  // This lowering has been placed after codegenprepare to take advantage of
+  // address mode matching (which is why it isn't put with the LDS lowerings).
+  // It could be placed anywhere before uniformity annotations (an analysis
+  // that it changes by splitting up fat pointers into their components)
+  // but has been put before switch lowering and CFG flattening so that those
+  // passes can run on the more optimized control flow this pass creates in
+  // many cases.
+  //
+  // FIXME: This should ideally be put after the LoadStoreVectorizer.
+  // However, due to some annoying facts about ResourceUsageAnalysis,
+  // (especially as exercised in the resource-usage-dead-function test),
+  // we need all the function passes codegenprepare all the way through
+  // said resource usage analysis to run on the call graph produced
+  // before codegenprepare runs (because codegenprepare will knock some
+  // nodes out of the graph, which leads to function-level passes not
+  // being run on them, which causes crashes in the resource usage analysis).
+  addPass(AMDGPULowerBufferFatPointersPass(TM));
+
   Base::addCodeGenPrepare(addPass);
 
+  if (isPassEnabled(EnableLoadStoreVectorizer))
+addPass(LoadStoreVectorizerPass());
+
   // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
   // behavior for subsequent passes. Placing it here seems better that these
   // blocks would get cleaned up by UnreachableBlockElim inserted next in the
@@ -106,3 +135,12 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(SILowerI1CopiesPass());
   return Error::success();
 }
+
+bool AMDGPUCodeGenPassBuilder::isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level) const {
+  if (Opt.getNumOccurrences())
+return Opt;
+  if (TM.getOptLevel() < Level)
+return false;
+  return Opt;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index efb296689bd647..1ff7744c84a436 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -28,6 +28,13 @@ class AMDGPUCodeGenPassBuilder
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
+
+  /// Check if a pass is enabled given \p Opt option. The option always
+  /// overrides defaults if explicitly used. Otherwise its default will
+  /// be used given that a pass shall work at an optimization \p Level
+  /// minimum.
+  bool isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level = CodeGenOptLevel::Default) const;
 };
 
 } // namespace llvm

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/102867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/102867?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#102867** https://app.graphite.dev/github/pr/llvm/llvm-project/102867?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#102865** https://app.graphite.dev/github/pr/llvm/llvm-project/102865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102816** https://app.graphite.dev/github/pr/llvm/llvm-project/102816?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102815** https://app.graphite.dev/github/pr/llvm/llvm-project/102815?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102814** https://app.graphite.dev/github/pr/llvm/llvm-project/102814?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102812** https://app.graphite.dev/github/pr/llvm/llvm-project/102812?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102806** https://app.graphite.dev/github/pr/llvm/llvm-project/102806?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102805** https://app.graphite.dev/github/pr/llvm/llvm-project/102805?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102645** https://app.graphite.dev/github/pr/llvm/llvm-project/102645?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102644** https://app.graphite.dev/github/pr/llvm/llvm-project/102644?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/102867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Matt Arsenault (arsenm)


Changes

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.

---
Full diff: https://github.com/llvm/llvm-project/pull/102867.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp (+38) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h (+7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 252a70d44736d..9fd7e24b114dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -21,8 +21,10 @@
 #include "llvm/Transforms/Utils/LCSSA.h"
 #include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 
 using namespace llvm;
+using namespace llvm::AMDGPU;
 
 AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 GCNTargetMachine &TM, const CGPassBuilderOption &Opts,
@@ -37,8 +39,35 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
+  // deleted soon.
+
+  if (EnableLowerKernelArguments)
+addPass(AMDGPULowerKernelArgumentsPass(TM));
+
+  // This lowering has been placed after codegenprepare to take advantage of
+  // address mode matching (which is why it isn't put with the LDS lowerings).
+  // It could be placed anywhere before uniformity annotations (an analysis
+  // that it changes by splitting up fat pointers into their components)
+  // but has been put before switch lowering and CFG flattening so that those
+  // passes can run on the more optimized control flow this pass creates in
+  // many cases.
+  //
+  // FIXME: This should ideally be put after the LoadStoreVectorizer.
+  // However, due to some annoying facts about ResourceUsageAnalysis,
+  // (especially as exercised in the resource-usage-dead-function test),
+  // we need all the function passes codegenprepare all the way through
+  // said resource usage analysis to run on the call graph produced
+  // before codegenprepare runs (because codegenprepare will knock some
+  // nodes out of the graph, which leads to function-level passes not
+  // being run on them, which causes crashes in the resource usage analysis).
+  addPass(AMDGPULowerBufferFatPointersPass(TM));
+
   Base::addCodeGenPrepare(addPass);
 
+  if (isPassEnabled(EnableLoadStoreVectorizer))
+addPass(LoadStoreVectorizerPass());
+
   // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
   // behavior for subsequent passes. Placing it here seems better that these
   // blocks would get cleaned up by UnreachableBlockElim inserted next in the
@@ -106,3 +135,12 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(SILowerI1CopiesPass());
   return Error::success();
 }
+
+bool AMDGPUCodeGenPassBuilder::isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level) const {
+  if (Opt.getNumOccurrences())
+return Opt;
+  if (TM.getOptLevel() < Level)
+return false;
+  return Opt;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index efb296689bd64..1ff7744c84a43 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -28,6 +28,13 @@ class AMDGPUCodeGenPassBuilder
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
+
+  /// Check if a pass is enabled given \p Opt option. The option always
+  /// overrides defaults if explicitly used. Otherwise its default will
+  /// be used given that a pass shall work at an optimization \p Level
+  /// minimum.
+  bool isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level = CodeGenOptLevel::Default) const;
 };
 
 } // namespace llvm

``




https://github.com/llvm/llvm-project/pull/102867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-amdgpu

Author: Matt Arsenault (arsenm)


Changes

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.

---
Full diff: https://github.com/llvm/llvm-project/pull/102867.diff


2 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp (+38) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h (+7) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 252a70d44736d..9fd7e24b114dd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -21,8 +21,10 @@
 #include "llvm/Transforms/Utils/LCSSA.h"
 #include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 
 using namespace llvm;
+using namespace llvm::AMDGPU;
 
 AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 GCNTargetMachine &TM, const CGPassBuilderOption &Opts,
@@ -37,8 +39,35 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
+  // deleted soon.
+
+  if (EnableLowerKernelArguments)
+addPass(AMDGPULowerKernelArgumentsPass(TM));
+
+  // This lowering has been placed after codegenprepare to take advantage of
+  // address mode matching (which is why it isn't put with the LDS lowerings).
+  // It could be placed anywhere before uniformity annotations (an analysis
+  // that it changes by splitting up fat pointers into their components)
+  // but has been put before switch lowering and CFG flattening so that those
+  // passes can run on the more optimized control flow this pass creates in
+  // many cases.
+  //
+  // FIXME: This should ideally be put after the LoadStoreVectorizer.
+  // However, due to some annoying facts about ResourceUsageAnalysis,
+  // (especially as exercised in the resource-usage-dead-function test),
+  // we need all the function passes codegenprepare all the way through
+  // said resource usage analysis to run on the call graph produced
+  // before codegenprepare runs (because codegenprepare will knock some
+  // nodes out of the graph, which leads to function-level passes not
+  // being run on them, which causes crashes in the resource usage analysis).
+  addPass(AMDGPULowerBufferFatPointersPass(TM));
+
   Base::addCodeGenPrepare(addPass);
 
+  if (isPassEnabled(EnableLoadStoreVectorizer))
+addPass(LoadStoreVectorizerPass());
+
   // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
   // behavior for subsequent passes. Placing it here seems better that these
   // blocks would get cleaned up by UnreachableBlockElim inserted next in the
@@ -106,3 +135,12 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(SILowerI1CopiesPass());
   return Error::success();
 }
+
+bool AMDGPUCodeGenPassBuilder::isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level) const {
+  if (Opt.getNumOccurrences())
+return Opt;
+  if (TM.getOptLevel() < Level)
+return false;
+  return Opt;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index efb296689bd64..1ff7744c84a43 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -28,6 +28,13 @@ class AMDGPUCodeGenPassBuilder
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
+
+  /// Check if a pass is enabled given \p Opt option. The option always
+  /// overrides defaults if explicitly used. Otherwise its default will
+  /// be used given that a pass shall work at an optimization \p Level
+  /// minimum.
+  bool isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level = CodeGenOptLevel::Default) const;
 };
 
 } // namespace llvm

``




https://github.com/llvm/llvm-project/pull/102867
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (PR #102806)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits


https://github.com/Pierre-vh approved this pull request.

Add [NFC] tag?

https://github.com/llvm/llvm-project/pull/102806
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits



@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
-  // TODO: Add passes pre instruction selection.
-  // Test only, convert to real IR passes in future.
+  const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;

Pierre-vh wrote:

Does this function run yet, or is this just preparatory work/NFC?

https://github.com/llvm/llvm-project/pull/102814
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)

2024-08-12 Thread Pierre van Houtryve via llvm-branch-commits



@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
-  // TODO: Add passes pre instruction selection.
-  // Test only, convert to real IR passes in future.
+  const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;
+  const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer;
+  const bool EnableStructurizerWorkarounds =
+  AMDGPUTargetMachine::EnableStructurizerWorkarounds;
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None)

Pierre-vh wrote:

tiny nit: put the opt level in a variable to avoid repeating the call?

https://github.com/llvm/llvm-project/pull/102814
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out addPreISelPasses (PR #102814)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits



@@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
-  // TODO: Add passes pre instruction selection.
-  // Test only, convert to real IR passes in future.
+  const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;

arsenm wrote:

It does run. We have a handful of tests running new PM already. I'm trying to 
avoid test failures caused by not running the same set of passes in the future 

https://github.com/llvm/llvm-project/pull/102814
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [libunwind] release/19.x: [libunwind] Add GCS support for AArch64 (#99335) (PR #101888)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/101888

>From d757d1e9485b19ac756863ad2a1f58a25720967c Mon Sep 17 00:00:00 2001
From: John Brawn 
Date: Sun, 4 Aug 2024 13:27:12 +0100
Subject: [PATCH 1/3] [libunwind] Add GCS support for AArch64 (#99335)

AArch64 GCS (Guarded Control Stack) is similar enough to CET that we can
re-use the existing code that is guarded by _LIBUNWIND_USE_CET, so long
as we also add defines to locate the GCS stack and pop the entries from
it. We also need the jumpto function to exit using br instead of ret, to
prevent it from popping the GCS stack.

GCS support is enabled using the LIBUNWIND_ENABLE_GCS cmake option. This
enables -mbranch-protection=standard, which enables GCS. For the places
we need to use GCS instructions we use the target attribute, as there's
not a command-line option to enable a specific architecture extension.

(cherry picked from commit b32aac4358c1f6639de7c453656cd74fbab75d71)
---
 libunwind/CMakeLists.txt  |  8 +
 libunwind/src/Registers.hpp   |  7 +
 libunwind/src/UnwindCursor.hpp|  6 ++--
 libunwind/src/UnwindLevel1.c  | 31 +--
 libunwind/src/UnwindRegistersRestore.S|  2 +-
 libunwind/src/cet_unwind.h| 18 +++
 libunwind/test/CMakeLists.txt |  1 +
 .../test/configs/llvm-libunwind-merged.cfg.in |  3 ++
 .../test/configs/llvm-libunwind-shared.cfg.in |  3 ++
 .../test/configs/llvm-libunwind-static.cfg.in |  3 ++
 10 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/libunwind/CMakeLists.txt b/libunwind/CMakeLists.txt
index b22ade0a7d71eb..28d67b0fef92cc 100644
--- a/libunwind/CMakeLists.txt
+++ b/libunwind/CMakeLists.txt
@@ -37,6 +37,7 @@ if (LIBUNWIND_BUILD_32_BITS)
 endif()
 
 option(LIBUNWIND_ENABLE_CET "Build libunwind with CET enabled." OFF)
+option(LIBUNWIND_ENABLE_GCS "Build libunwind with GCS enabled." OFF)
 option(LIBUNWIND_ENABLE_ASSERTIONS "Enable assertions independent of build 
mode." ON)
 option(LIBUNWIND_ENABLE_PEDANTIC "Compile with pedantic enabled." ON)
 option(LIBUNWIND_ENABLE_WERROR "Fail and stop if a warning is triggered." OFF)
@@ -188,6 +189,13 @@ if (LIBUNWIND_ENABLE_CET)
   endif()
 endif()
 
+if (LIBUNWIND_ENABLE_GCS)
+  add_compile_flags_if_supported(-mbranch-protection=standard)
+  if (NOT CXX_SUPPORTS_MBRANCH_PROTECTION_EQ_STANDARD_FLAG)
+message(SEND_ERROR "Compiler doesn't support GCS -mbranch-protection 
option!")
+  endif()
+endif()
+
 if (WIN32)
   # The headers lack matching dllexport attributes (_LIBUNWIND_EXPORT);
   # silence the warning instead of cluttering the headers (which aren't
diff --git a/libunwind/src/Registers.hpp b/libunwind/src/Registers.hpp
index d11ddb3426d522..861e6b5f6f2c58 100644
--- a/libunwind/src/Registers.hpp
+++ b/libunwind/src/Registers.hpp
@@ -1815,6 +1815,13 @@ inline const char *Registers_ppc64::getRegisterName(int 
regNum) {
 /// process.
 class _LIBUNWIND_HIDDEN Registers_arm64;
 extern "C" void __libunwind_Registers_arm64_jumpto(Registers_arm64 *);
+
+#if defined(_LIBUNWIND_USE_GCS)
+extern "C" void *__libunwind_cet_get_jump_target() {
+  return reinterpret_cast(&__libunwind_Registers_arm64_jumpto);
+}
+#endif
+
 class _LIBUNWIND_HIDDEN Registers_arm64 {
 public:
   Registers_arm64();
diff --git a/libunwind/src/UnwindCursor.hpp b/libunwind/src/UnwindCursor.hpp
index 758557337899ed..06e654197351df 100644
--- a/libunwind/src/UnwindCursor.hpp
+++ b/libunwind/src/UnwindCursor.hpp
@@ -471,7 +471,7 @@ class _LIBUNWIND_HIDDEN AbstractUnwindCursor {
   }
 #endif
 
-#if defined(_LIBUNWIND_USE_CET)
+#if defined(_LIBUNWIND_USE_CET) || defined(_LIBUNWIND_USE_GCS)
   virtual void *get_registers() {
 _LIBUNWIND_ABORT("get_registers not implemented");
   }
@@ -954,7 +954,7 @@ class UnwindCursor : public AbstractUnwindCursor{
   virtual uintptr_t getDataRelBase();
 #endif
 
-#if defined(_LIBUNWIND_USE_CET)
+#if defined(_LIBUNWIND_USE_CET) || defined(_LIBUNWIND_USE_GCS)
   virtual void *get_registers() { return &_registers; }
 #endif
 
@@ -3005,7 +3005,7 @@ bool UnwindCursor::isReadableAddr(const pint_t 
addr) const {
 }
 #endif
 
-#if defined(_LIBUNWIND_USE_CET)
+#if defined(_LIBUNWIND_USE_CET) || defined(_LIBUNWIND_USE_GCS)
 extern "C" void *__libunwind_cet_get_registers(unw_cursor_t *cursor) {
   AbstractUnwindCursor *co = (AbstractUnwindCursor *)cursor;
   return co->get_registers();
diff --git a/libunwind/src/UnwindLevel1.c b/libunwind/src/UnwindLevel1.c
index 48e7bc3b9e00ec..7e785f4d31e716 100644
--- a/libunwind/src/UnwindLevel1.c
+++ b/libunwind/src/UnwindLevel1.c
@@ -44,7 +44,7 @@
 // _LIBUNWIND_POP_CET_SSP is used to adjust CET shadow stack pointer and we
 // directly jump to __libunwind_Registers_x86/x86_64_jumpto instead of using
 // a regular function call to avoid pushing to CET shadow stack again.
-#if !defined(_LIBUNWIND_USE_CET)
+#if !defined(_LIBUNWIND_USE_CET) && !defined(_LIBUNWIND_USE_GCS)
 #defi

[llvm-branch-commits] [libunwind] release/19.x: [libunwind] Add GCS support for AArch64 (#99335) (PR #101888)

2024-08-12 Thread John Brawn via llvm-branch-commits


john-brawn-arm wrote:

I've added the commits that fix this on Android, and fix a problem with 
combining GCS and BTI.

https://github.com/llvm/llvm-project/pull/101888
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Port AMDGPULateCodeGenPrepare to new pass manager (PR #102806)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **Aug 12, 6:58 AM EDT**: @arsenm started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/102806).


https://github.com/llvm/llvm-project/pull/102806
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] StructurizeCFG: Add SkipUniformRegions pass parameter to new PM version (PR #102812)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

### Merge activity

* **Aug 12, 6:58 AM EDT**: @arsenm started a stack merge that includes this 
pull request via 
[Graphite](https://app.graphite.dev/github/pr/llvm/llvm-project/102812).


https://github.com/llvm/llvm-project/pull/102812
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [llvm][CodeGen] Address the issue discovered In window scheduling (#101665) (PR #102881)

2024-08-12 Thread Kai Yan via llvm-branch-commits


https://github.com/kaiyan96 created 
https://github.com/llvm/llvm-project/pull/102881

We have following bugfixes for window scheduler, do we need submit them by 
ourselves?
* [Added a new restriction for II by pragma in window 
scheduler](https://github.com/llvm/llvm-project/pull/99448)
* [Fixed a bug in stall cycle calculation for window 
scheduler](https://github.com/llvm/llvm-project/pull/99451)
* [Added missing initialization failure information for window 
scheduler](https://github.com/llvm/llvm-project/pull/99449)
* [Fixed max cycle calculation with zero-cost instructions for window scheduler 
](https://github.com/llvm/llvm-project/pull/99454)
* [Address the issue of multiple resource reservations In window 
scheduling](https://github.com/llvm/llvm-project/pull/101665)


>From 9f1b8a25b51bdb6842f8bf27813569ca1c341d5f Mon Sep 17 00:00:00 2001
From: Kai Yan 
Date: Wed, 24 Jul 2024 12:06:35 +0800
Subject: [PATCH 1/5] [llvm][CodeGen] Added missing initialization failure
 information for window scheduler (#99449)

Added missing initialization failure information for window scheduler.
---
 llvm/lib/CodeGen/WindowScheduler.cpp| 5 -
 llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/CodeGen/WindowScheduler.cpp 
b/llvm/lib/CodeGen/WindowScheduler.cpp
index 0777480499e55b..3fe8a1aaafd128 100644
--- a/llvm/lib/CodeGen/WindowScheduler.cpp
+++ b/llvm/lib/CodeGen/WindowScheduler.cpp
@@ -232,8 +232,11 @@ bool WindowScheduler::initialize() {
   return false;
 }
 for (auto &Def : MI.all_defs())
-  if (Def.isReg() && Def.getReg().isPhysical())
+  if (Def.isReg() && Def.getReg().isPhysical()) {
+LLVM_DEBUG(dbgs() << "Physical registers are not supported in "
+ "window scheduling!\n");
 return false;
+  }
   }
   if (SchedInstrNum <= WindowRegionLimit) {
 LLVM_DEBUG(dbgs() << "There are too few MIs in the window region!\n");
diff --git a/llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir 
b/llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir
index 601b98dca8e20b..be75301b016ed9 100644
--- a/llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir
+++ b/llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir
@@ -3,6 +3,7 @@
 # RUN: -window-sched=force -filetype=null -verify-machineinstrs 2>&1 \
 # RUN: | FileCheck %s
 
+# CHECK: Physical registers are not supported in window scheduling!
 # CHECK: The WindowScheduler failed to initialize!
 
 ---

>From 0c7e10d69547adeb460feeff88d10c774461cd94 Mon Sep 17 00:00:00 2001
From: Kai Yan 
Date: Wed, 24 Jul 2024 12:11:58 +0800
Subject: [PATCH 2/5] [llvm][CodeGen] Added a new restriction for II by pragma
 in window scheduler (#99448)

Added a new restriction for window scheduling.
Window scheduling is disabled when llvm.loop.pipeline.initiationinterval
is set.
---
 llvm/lib/CodeGen/MachinePipeliner.cpp | 12 ++-
 ...swp-ws-pragma-initiation-interval-fail.mir | 83 +++
 2 files changed, 93 insertions(+), 2 deletions(-)
 create mode 100644 
llvm/test/CodeGen/Hexagon/swp-ws-pragma-initiation-interval-fail.mir

diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp 
b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 497e282bb97682..5c68711ff61938 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -528,8 +528,16 @@ bool MachinePipeliner::useSwingModuloScheduler() {
 }
 
 bool MachinePipeliner::useWindowScheduler(bool Changed) {
-  // WindowScheduler does not work when it is off or when SwingModuloScheduler
-  // is successfully scheduled.
+  // WindowScheduler does not work for following cases:
+  // 1. when it is off.
+  // 2. when SwingModuloScheduler is successfully scheduled.
+  // 3. when pragma II is enabled.
+  if (II_setByPragma) {
+LLVM_DEBUG(dbgs() << "Window scheduling is disabled when "
+ "llvm.loop.pipeline.initiationinterval is set.\n");
+return false;
+  }
+
   return WindowSchedulingOption == WindowSchedulingFlag::WS_Force ||
  (WindowSchedulingOption == WindowSchedulingFlag::WS_On && !Changed);
 }
diff --git 
a/llvm/test/CodeGen/Hexagon/swp-ws-pragma-initiation-interval-fail.mir 
b/llvm/test/CodeGen/Hexagon/swp-ws-pragma-initiation-interval-fail.mir
new file mode 100644
index 00..6e69a76290fb1d
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/swp-ws-pragma-initiation-interval-fail.mir
@@ -0,0 +1,83 @@
+# RUN: llc  --march=hexagon %s -run-pass=pipeliner -debug-only=pipeliner \
+# RUN: -window-sched=force -filetype=null -verify-machineinstrs 2>&1 \
+# RUN: | FileCheck %s
+# REQUIRES: asserts
+
+# Test that checks no window scheduler is performed if the II set by pragma was
+# enabled
+
+# CHECK: Window scheduling is disabled when 
llvm.loop.pipeline.initiationinterval is set.
+
+--- |
+  define void @test_pragma_ii_fail(ptr %a0, i32 %a1) {
+  b0:
+%v0 = icmp sgt i32 %a1, 1
+br i1 %v0, label %b1, label %b4
+
+  b1:

[llvm-branch-commits] [llvm] [llvm][CodeGen] Address the issue discovered In window scheduling (#101665) (PR #102881)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-hexagon

Author: Kai Yan (kaiyan96)


Changes

We have following bugfixes for window scheduler, do we need submit them by 
ourselves?
* [Added a new restriction for II by pragma in window 
scheduler](https://github.com/llvm/llvm-project/pull/99448)
* [Fixed a bug in stall cycle calculation for window 
scheduler](https://github.com/llvm/llvm-project/pull/99451)
* [Added missing initialization failure information for window 
scheduler](https://github.com/llvm/llvm-project/pull/99449)
* [Fixed max cycle calculation with zero-cost instructions for window scheduler 
](https://github.com/llvm/llvm-project/pull/99454)
* [Address the issue of multiple resource reservations In window 
scheduling](https://github.com/llvm/llvm-project/pull/101665)


---
Full diff: https://github.com/llvm/llvm-project/pull/102881.diff


7 Files Affected:

- (modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+10-2) 
- (modified) llvm/lib/CodeGen/WindowScheduler.cpp (+18-11) 
- (modified) llvm/test/CodeGen/Hexagon/swp-ws-fail-2.mir (+1) 
- (added) llvm/test/CodeGen/Hexagon/swp-ws-pragma-initiation-interval-fail.mir 
(+83) 
- (added) llvm/test/CodeGen/Hexagon/swp-ws-resource-reserve.mir (+100) 
- (added) llvm/test/CodeGen/Hexagon/swp-ws-stall-cycle.mir (+59) 
- (added) llvm/test/CodeGen/Hexagon/swp-ws-zero-cost.mir (+45) 


``diff
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp 
b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 497e282bb97682..5c68711ff61938 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -528,8 +528,16 @@ bool MachinePipeliner::useSwingModuloScheduler() {
 }
 
 bool MachinePipeliner::useWindowScheduler(bool Changed) {
-  // WindowScheduler does not work when it is off or when SwingModuloScheduler
-  // is successfully scheduled.
+  // WindowScheduler does not work for following cases:
+  // 1. when it is off.
+  // 2. when SwingModuloScheduler is successfully scheduled.
+  // 3. when pragma II is enabled.
+  if (II_setByPragma) {
+LLVM_DEBUG(dbgs() << "Window scheduling is disabled when "
+ "llvm.loop.pipeline.initiationinterval is set.\n");
+return false;
+  }
+
   return WindowSchedulingOption == WindowSchedulingFlag::WS_Force ||
  (WindowSchedulingOption == WindowSchedulingFlag::WS_On && !Changed);
 }
diff --git a/llvm/lib/CodeGen/WindowScheduler.cpp 
b/llvm/lib/CodeGen/WindowScheduler.cpp
index 0777480499e55b..f1658e36ae1e92 100644
--- a/llvm/lib/CodeGen/WindowScheduler.cpp
+++ b/llvm/lib/CodeGen/WindowScheduler.cpp
@@ -232,8 +232,11 @@ bool WindowScheduler::initialize() {
   return false;
 }
 for (auto &Def : MI.all_defs())
-  if (Def.isReg() && Def.getReg().isPhysical())
+  if (Def.isReg() && Def.getReg().isPhysical()) {
+LLVM_DEBUG(dbgs() << "Physical registers are not supported in "
+ "window scheduling!\n");
 return false;
+  }
   }
   if (SchedInstrNum <= WindowRegionLimit) {
 LLVM_DEBUG(dbgs() << "There are too few MIs in the window region!\n");
@@ -437,14 +440,17 @@ int WindowScheduler::calculateMaxCycle(ScheduleDAGInstrs 
&DAG,
   int PredCycle = getOriCycle(PredMI);
   ExpectCycle = std::max(ExpectCycle, PredCycle + (int)Pred.getLatency());
 }
-// ResourceManager can be used to detect resource conflicts between the
-// current MI and the previously inserted MIs.
-while (!RM.canReserveResources(*SU, CurCycle) || CurCycle < ExpectCycle) {
-  ++CurCycle;
-  if (CurCycle == (int)WindowIILimit)
-return CurCycle;
+// Zero cost instructions do not need to check resource.
+if (!TII->isZeroCost(MI.getOpcode())) {
+  // ResourceManager can be used to detect resource conflicts between the
+  // current MI and the previously inserted MIs.
+  while (!RM.canReserveResources(*SU, CurCycle) || CurCycle < ExpectCycle) 
{
+++CurCycle;
+if (CurCycle == (int)WindowIILimit)
+  return CurCycle;
+  }
+  RM.reserveResources(*SU, CurCycle);
 }
-RM.reserveResources(*SU, CurCycle);
 OriToCycle[getOriMI(&MI)] = CurCycle;
 LLVM_DEBUG(dbgs() << "\tCycle " << CurCycle << " [S."
   << getOriStage(getOriMI(&MI), Offset) << "]: " << MI);
@@ -485,6 +491,7 @@ int WindowScheduler::calculateMaxCycle(ScheduleDAGInstrs 
&DAG,
 // 
 int WindowScheduler::calculateStallCycle(unsigned Offset, int MaxCycle) {
   int MaxStallCycle = 0;
+  int CurrentII = MaxCycle + 1;
   auto Range = getScheduleRange(Offset, SchedInstrNum);
   for (auto &MI : Range) {
 auto *SU = TripleDAG->getSUnit(&MI);
@@ -492,8 +499,8 @@ int WindowScheduler::calculateStallCycle(unsigned Offset, 
int MaxCycle) {
 for (auto &Succ : SU->Succs) {
   if (Succ.isWeak() || Succ.getSUnit() == &TripleDAG->ExitSU)
 continue;
-  // If the expected cycle does not exceed MaxCycle, no check is

[llvm-branch-commits] [llvm] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes (PR #102815)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102815

>From 355428bc4060b424e0645c2747c7a19513a4edc7 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 11 Aug 2024 18:11:04 +0400
Subject: [PATCH] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes

---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 3cc39b54ba758d..eb15beb835b535 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -27,6 +27,8 @@
 #include "llvm/CodeGen/CodeGenPrepare.h"
 #include "llvm/CodeGen/DeadMachineInstructionElim.h"
 #include "llvm/CodeGen/DwarfEHPrepare.h"
+#include "llvm/CodeGen/ExpandLargeDivRem.h"
+#include "llvm/CodeGen/ExpandLargeFpConvert.h"
 #include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/ExpandReductions.h"
 #include "llvm/CodeGen/FinalizeISel.h"
@@ -627,6 +629,8 @@ void CodeGenPassBuilder::addISelPasses(
 addPass(LowerEmuTLSPass());
 
   addPass(PreISelIntrinsicLoweringPass(&TM));
+  addPass(ExpandLargeDivRemPass(&TM));
+  addPass(ExpandLargeFpConvertPass(&TM));
 
   derived().addIRPasses(addPass);
   derived().addCodeGenPrepare(addPass);

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start implementing addCodeGenPrepare (PR #102816)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102816

>From c5f7f3c67470c604f3d474d4dfa932a3c5efb4f5 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 11 Aug 2024 18:20:23 +0400
Subject: [PATCH] AMDGPU/NewPM: Start implementing addCodeGenPrepare

---
 llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp | 11 +++
 llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h   |  4 +++-
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp  |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 36f44a20d95532..252a70d44736dc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -19,6 +19,7 @@
 #include "llvm/Transforms/Scalar/StructurizeCFG.h"
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/Transforms/Utils/LCSSA.h"
+#include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
 
 using namespace llvm;
@@ -35,6 +36,16 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
   ShadowStackGCLoweringPass>();
 }
 
+void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  Base::addCodeGenPrepare(addPass);
+
+  // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
+  // behavior for subsequent passes. Placing it here seems better that these
+  // blocks would get cleaned up by UnreachableBlockElim inserted next in the
+  // pass flow.
+  addPass(LowerSwitchPass());
+}
+
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
   const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;
   const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index e656e166b3eb2e..efb296689bd647 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -19,10 +19,12 @@ class GCNTargetMachine;
 class AMDGPUCodeGenPassBuilder
 : public CodeGenPassBuilder {
 public:
+  using Base = CodeGenPassBuilder;
+
   AMDGPUCodeGenPassBuilder(GCNTargetMachine &TM,
const CGPassBuilderOption &Opts,
PassInstrumentationCallbacks *PIC);
-
+  void addCodeGenPrepare(AddIRPass &) const;
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 5929dadf93bcbe..cad4585c5b3013 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/InferAddressSpaces.h"
 #include "llvm/Transforms/Utils.h"
+#include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/SimplifyLibCalls.h"
 #include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 #include 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102865

>From 64679cbc78a5bba63bf0b5eb5427ffae7aae6b22 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 12:46:00 +0400
Subject: [PATCH] AMDGPU: Declare pass control flags in header

This will allow them to be shared between the old PM and new PM files.

I don't really like needing to expose these globally like this; maybe
it would be better to just move TargetPassConfig and the CodeGenPassBuilder
into one common file?
---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 203 --
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |  41 
 2 files changed, 133 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index cad4585c5b3013..3409a49fe203f9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -74,6 +74,7 @@
 
 using namespace llvm;
 using namespace llvm::PatternMatch;
+using namespace llvm::AMDGPU;
 
 namespace {
 class SGPRRegisterRegAlloc : public RegisterRegAllocBase 
{
@@ -186,109 +187,95 @@ static VGPRRegisterRegAlloc fastRegAllocVGPR(
   "fast", "fast register allocator", createFastVGPRRegisterAllocator);
 } // anonymous namespace
 
-static cl::opt
-EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
-cl::desc("Run early if-conversion"),
-cl::init(false));
+namespace llvm::AMDGPU {
+cl::opt EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
+  cl::desc("Run early if-conversion"),
+  cl::init(false));
 
-static cl::opt
-OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
-cl::desc("Run pre-RA exec mask optimizations"),
-cl::init(true));
+cl::opt OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
+   cl::desc("Run pre-RA exec mask optimizations"),
+   cl::init(true));
 
-static cl::opt
+cl::opt
 LowerCtorDtor("amdgpu-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
   cl::init(true), cl::Hidden);
 
 // Option to disable vectorizer for tests.
-static cl::opt EnableLoadStoreVectorizer(
-  "amdgpu-load-store-vectorizer",
-  cl::desc("Enable load store vectorizer"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt
+EnableLoadStoreVectorizer("amdgpu-load-store-vectorizer",
+  cl::desc("Enable load store vectorizer"),
+  cl::init(true), cl::Hidden);
 
 // Option to control global loads scalarization
-static cl::opt ScalarizeGlobal(
-  "amdgpu-scalarize-global-loads",
-  cl::desc("Enable global load scalarization"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt ScalarizeGlobal("amdgpu-scalarize-global-loads",
+  cl::desc("Enable global load scalarization"),
+  cl::init(true), cl::Hidden);
 
 // Option to run internalize pass.
-static cl::opt InternalizeSymbols(
-  "amdgpu-internalize-symbols",
-  cl::desc("Enable elimination of non-kernel functions and unused globals"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt InternalizeSymbols(
+"amdgpu-internalize-symbols",
+cl::desc("Enable elimination of non-kernel functions and unused globals"),
+cl::init(false), cl::Hidden);
 
 // Option to inline all early.
-static cl::opt EarlyInlineAll(
-  "amdgpu-early-inline-all",
-  cl::desc("Inline all functions early"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt EarlyInlineAll("amdgpu-early-inline-all",
+ cl::desc("Inline all functions early"),
+ cl::init(false), cl::Hidden);
 
-static cl::opt RemoveIncompatibleFunctions(
+cl::opt RemoveIncompatibleFunctions(
 "amdgpu-enable-remove-incompatible-functions", cl::Hidden,
 cl::desc("Enable removal of functions when they"
  "use features not supported by the target GPU"),
 cl::init(true));
 
-static cl::opt EnableSDWAPeephole(
-  "amdgpu-sdwa-peephole",
-  cl::desc("Enable SDWA peepholer"),
-  cl::init(true));
+cl::opt EnableSDWAPeephole("amdgpu-sdwa-peephole",
+ cl::desc("Enable SDWA peepholer"),
+ cl::init(true));
 
-static cl::opt EnableDPPCombine(
-  "amdgpu-dpp-combine",
-  cl::desc("Enable DPP combiner"),
-  cl::init(true));
+cl::opt EnableDPPCombine("amdgpu-dpp-combine",
+   cl::desc("Enable DPP combiner"), 
cl::init(true));
 
 // Enable address space based alias analysis
-static cl::opt EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
-  cl::desc("Enable AMDGPU Alias Analysis"),
-  cl::init(true));
+cl::opt
+EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
+  cl::de

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102867

>From a0b9380496e1c5e42c4c8601a663faee7d3dd365 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 13:09:55 +0400
Subject: [PATCH] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.
---
 .../AMDGPU/AMDGPUCodeGenPassBuilder.cpp   | 38 +++
 .../Target/AMDGPU/AMDGPUCodeGenPassBuilder.h  |  7 
 2 files changed, 45 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 252a70d44736dc..9fd7e24b114ddd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -21,8 +21,10 @@
 #include "llvm/Transforms/Utils/LCSSA.h"
 #include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 
 using namespace llvm;
+using namespace llvm::AMDGPU;
 
 AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 GCNTargetMachine &TM, const CGPassBuilderOption &Opts,
@@ -37,8 +39,35 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
+  // deleted soon.
+
+  if (EnableLowerKernelArguments)
+addPass(AMDGPULowerKernelArgumentsPass(TM));
+
+  // This lowering has been placed after codegenprepare to take advantage of
+  // address mode matching (which is why it isn't put with the LDS lowerings).
+  // It could be placed anywhere before uniformity annotations (an analysis
+  // that it changes by splitting up fat pointers into their components)
+  // but has been put before switch lowering and CFG flattening so that those
+  // passes can run on the more optimized control flow this pass creates in
+  // many cases.
+  //
+  // FIXME: This should ideally be put after the LoadStoreVectorizer.
+  // However, due to some annoying facts about ResourceUsageAnalysis,
+  // (especially as exercised in the resource-usage-dead-function test),
+  // we need all the function passes codegenprepare all the way through
+  // said resource usage analysis to run on the call graph produced
+  // before codegenprepare runs (because codegenprepare will knock some
+  // nodes out of the graph, which leads to function-level passes not
+  // being run on them, which causes crashes in the resource usage analysis).
+  addPass(AMDGPULowerBufferFatPointersPass(TM));
+
   Base::addCodeGenPrepare(addPass);
 
+  if (isPassEnabled(EnableLoadStoreVectorizer))
+addPass(LoadStoreVectorizerPass());
+
   // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
   // behavior for subsequent passes. Placing it here seems better that these
   // blocks would get cleaned up by UnreachableBlockElim inserted next in the
@@ -106,3 +135,12 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(SILowerI1CopiesPass());
   return Error::success();
 }
+
+bool AMDGPUCodeGenPassBuilder::isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level) const {
+  if (Opt.getNumOccurrences())
+return Opt;
+  if (TM.getOptLevel() < Level)
+return false;
+  return Opt;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index efb296689bd647..1ff7744c84a436 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -28,6 +28,13 @@ class AMDGPUCodeGenPassBuilder
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
+
+  /// Check if a pass is enabled given \p Opt option. The option always
+  /// overrides defaults if explicitly used. Otherwise its default will
+  /// be used given that a pass shall work at an optimization \p Level
+  /// minimum.
+  bool isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level = CodeGenOptLevel::Default) const;
 };
 
 } // namespace llvm

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Add streaming-mode stack hazard optimization remarks (#101695) (PR #102168)

2024-08-12 Thread Aaron Ballman via llvm-branch-commits

AaronBallman wrote:

> The patch here is pretty big in size, but it seems to only affects the 
> remarks, on the other hand it doesn't seem to really fix anything and in that 
> case I feel like RC3 might be the wrong time to merge this. Is there a huge 
> upside to take this this late in the process?
> 
> Also ping @jroelofs as aarch64 domain expert and @AaronBallman as clang 
> maintainer.

We had 8 release candidates for 18.x and I would *very much* like to avoid that 
happening again, so I think that because we're about to hit rc3 (the last 
scheduled rc before we release according to the release schedule posted at 
https://llvm.org/) we should only be taking low-risk, high-impact changes such 
as fixes to regressions or obviously correct changes. I don't think this patch 
qualifies; is there significant risk to not putting this in? (e.g., does this 
fix what you would consider to be a stop-ship level issue of some kind?)

https://github.com/llvm/llvm-project/pull/102168
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes (PR #102815)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102815

>From 97c4b4cd982579447f51ce8a47cce6b690870f82 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 11 Aug 2024 18:11:04 +0400
Subject: [PATCH] CodeGen/NewPM: Add ExpandLarge* passes to isel IR passes

---
 llvm/include/llvm/Passes/CodeGenPassBuilder.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/llvm/include/llvm/Passes/CodeGenPassBuilder.h 
b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
index 3cc39b54ba758d..eb15beb835b535 100644
--- a/llvm/include/llvm/Passes/CodeGenPassBuilder.h
+++ b/llvm/include/llvm/Passes/CodeGenPassBuilder.h
@@ -27,6 +27,8 @@
 #include "llvm/CodeGen/CodeGenPrepare.h"
 #include "llvm/CodeGen/DeadMachineInstructionElim.h"
 #include "llvm/CodeGen/DwarfEHPrepare.h"
+#include "llvm/CodeGen/ExpandLargeDivRem.h"
+#include "llvm/CodeGen/ExpandLargeFpConvert.h"
 #include "llvm/CodeGen/ExpandMemCmp.h"
 #include "llvm/CodeGen/ExpandReductions.h"
 #include "llvm/CodeGen/FinalizeISel.h"
@@ -627,6 +629,8 @@ void CodeGenPassBuilder::addISelPasses(
 addPass(LowerEmuTLSPass());
 
   addPass(PreISelIntrinsicLoweringPass(&TM));
+  addPass(ExpandLargeDivRemPass(&TM));
+  addPass(ExpandLargeFpConvertPass(&TM));
 
   derived().addIRPasses(addPass);
   derived().addCodeGenPrepare(addPass);

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start implementing addCodeGenPrepare (PR #102816)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102816

>From 673d0b7253b82d3a38b66e87831c14885110cc5c Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Sun, 11 Aug 2024 18:20:23 +0400
Subject: [PATCH] AMDGPU/NewPM: Start implementing addCodeGenPrepare

---
 llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp | 11 +++
 llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h   |  4 +++-
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp  |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 36f44a20d95532..252a70d44736dc 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -19,6 +19,7 @@
 #include "llvm/Transforms/Scalar/StructurizeCFG.h"
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/Transforms/Utils/LCSSA.h"
+#include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
 
 using namespace llvm;
@@ -35,6 +36,16 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
   ShadowStackGCLoweringPass>();
 }
 
+void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  Base::addCodeGenPrepare(addPass);
+
+  // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
+  // behavior for subsequent passes. Placing it here seems better that these
+  // blocks would get cleaned up by UnreachableBlockElim inserted next in the
+  // pass flow.
+  addPass(LowerSwitchPass());
+}
+
 void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
   const bool LateCFGStructurize = 
AMDGPUTargetMachine::EnableLateStructurizeCFG;
   const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index e656e166b3eb2e..efb296689bd647 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -19,10 +19,12 @@ class GCNTargetMachine;
 class AMDGPUCodeGenPassBuilder
 : public CodeGenPassBuilder {
 public:
+  using Base = CodeGenPassBuilder;
+
   AMDGPUCodeGenPassBuilder(GCNTargetMachine &TM,
const CGPassBuilderOption &Opts,
PassInstrumentationCallbacks *PIC);
-
+  void addCodeGenPrepare(AddIRPass &) const;
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 5929dadf93bcbe..cad4585c5b3013 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -67,6 +67,7 @@
 #include "llvm/Transforms/Scalar/GVN.h"
 #include "llvm/Transforms/Scalar/InferAddressSpaces.h"
 #include "llvm/Transforms/Utils.h"
+#include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/SimplifyLibCalls.h"
 #include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 #include 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU: Declare pass control flags in header (PR #102865)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102865

>From c5882b7e0b8bc85390cb82495821965013494d12 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 12:46:00 +0400
Subject: [PATCH] AMDGPU: Declare pass control flags in header

This will allow them to be shared between the old PM and new PM files.

I don't really like needing to expose these globally like this; maybe
it would be better to just move TargetPassConfig and the CodeGenPassBuilder
into one common file?
---
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 203 --
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h  |  41 
 2 files changed, 133 insertions(+), 111 deletions(-)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index cad4585c5b3013..3409a49fe203f9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -74,6 +74,7 @@
 
 using namespace llvm;
 using namespace llvm::PatternMatch;
+using namespace llvm::AMDGPU;
 
 namespace {
 class SGPRRegisterRegAlloc : public RegisterRegAllocBase 
{
@@ -186,109 +187,95 @@ static VGPRRegisterRegAlloc fastRegAllocVGPR(
   "fast", "fast register allocator", createFastVGPRRegisterAllocator);
 } // anonymous namespace
 
-static cl::opt
-EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
-cl::desc("Run early if-conversion"),
-cl::init(false));
+namespace llvm::AMDGPU {
+cl::opt EnableEarlyIfConversion("amdgpu-early-ifcvt", cl::Hidden,
+  cl::desc("Run early if-conversion"),
+  cl::init(false));
 
-static cl::opt
-OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
-cl::desc("Run pre-RA exec mask optimizations"),
-cl::init(true));
+cl::opt OptExecMaskPreRA("amdgpu-opt-exec-mask-pre-ra", cl::Hidden,
+   cl::desc("Run pre-RA exec mask optimizations"),
+   cl::init(true));
 
-static cl::opt
+cl::opt
 LowerCtorDtor("amdgpu-lower-global-ctor-dtor",
   cl::desc("Lower GPU ctor / dtors to globals on the device."),
   cl::init(true), cl::Hidden);
 
 // Option to disable vectorizer for tests.
-static cl::opt EnableLoadStoreVectorizer(
-  "amdgpu-load-store-vectorizer",
-  cl::desc("Enable load store vectorizer"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt
+EnableLoadStoreVectorizer("amdgpu-load-store-vectorizer",
+  cl::desc("Enable load store vectorizer"),
+  cl::init(true), cl::Hidden);
 
 // Option to control global loads scalarization
-static cl::opt ScalarizeGlobal(
-  "amdgpu-scalarize-global-loads",
-  cl::desc("Enable global load scalarization"),
-  cl::init(true),
-  cl::Hidden);
+cl::opt ScalarizeGlobal("amdgpu-scalarize-global-loads",
+  cl::desc("Enable global load scalarization"),
+  cl::init(true), cl::Hidden);
 
 // Option to run internalize pass.
-static cl::opt InternalizeSymbols(
-  "amdgpu-internalize-symbols",
-  cl::desc("Enable elimination of non-kernel functions and unused globals"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt InternalizeSymbols(
+"amdgpu-internalize-symbols",
+cl::desc("Enable elimination of non-kernel functions and unused globals"),
+cl::init(false), cl::Hidden);
 
 // Option to inline all early.
-static cl::opt EarlyInlineAll(
-  "amdgpu-early-inline-all",
-  cl::desc("Inline all functions early"),
-  cl::init(false),
-  cl::Hidden);
+cl::opt EarlyInlineAll("amdgpu-early-inline-all",
+ cl::desc("Inline all functions early"),
+ cl::init(false), cl::Hidden);
 
-static cl::opt RemoveIncompatibleFunctions(
+cl::opt RemoveIncompatibleFunctions(
 "amdgpu-enable-remove-incompatible-functions", cl::Hidden,
 cl::desc("Enable removal of functions when they"
  "use features not supported by the target GPU"),
 cl::init(true));
 
-static cl::opt EnableSDWAPeephole(
-  "amdgpu-sdwa-peephole",
-  cl::desc("Enable SDWA peepholer"),
-  cl::init(true));
+cl::opt EnableSDWAPeephole("amdgpu-sdwa-peephole",
+ cl::desc("Enable SDWA peepholer"),
+ cl::init(true));
 
-static cl::opt EnableDPPCombine(
-  "amdgpu-dpp-combine",
-  cl::desc("Enable DPP combiner"),
-  cl::init(true));
+cl::opt EnableDPPCombine("amdgpu-dpp-combine",
+   cl::desc("Enable DPP combiner"), 
cl::init(true));
 
 // Enable address space based alias analysis
-static cl::opt EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
-  cl::desc("Enable AMDGPU Alias Analysis"),
-  cl::init(true));
+cl::opt
+EnableAMDGPUAliasAnalysis("enable-amdgpu-aa", cl::Hidden,
+  cl::de

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare (PR #102867)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102867

>From 9aff06fda45db9ddcdb8879a6886552c50613930 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 13:09:55 +0400
Subject: [PATCH] AMDGPU/NewPM: Fill out passes in addCodeGenPrepare

AMDGPUAnnotateKernelFeatures hasn't been ported yet, but it
should be soon removable.
---
 .../AMDGPU/AMDGPUCodeGenPassBuilder.cpp   | 38 +++
 .../Target/AMDGPU/AMDGPUCodeGenPassBuilder.h  |  7 
 2 files changed, 45 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 252a70d44736dc..9fd7e24b114ddd 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -21,8 +21,10 @@
 #include "llvm/Transforms/Utils/LCSSA.h"
 #include "llvm/Transforms/Utils/LowerSwitch.h"
 #include "llvm/Transforms/Utils/UnifyLoopExits.h"
+#include "llvm/Transforms/Vectorize/LoadStoreVectorizer.h"
 
 using namespace llvm;
+using namespace llvm::AMDGPU;
 
 AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 GCNTargetMachine &TM, const CGPassBuilderOption &Opts,
@@ -37,8 +39,35 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
 }
 
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
+  // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
+  // deleted soon.
+
+  if (EnableLowerKernelArguments)
+addPass(AMDGPULowerKernelArgumentsPass(TM));
+
+  // This lowering has been placed after codegenprepare to take advantage of
+  // address mode matching (which is why it isn't put with the LDS lowerings).
+  // It could be placed anywhere before uniformity annotations (an analysis
+  // that it changes by splitting up fat pointers into their components)
+  // but has been put before switch lowering and CFG flattening so that those
+  // passes can run on the more optimized control flow this pass creates in
+  // many cases.
+  //
+  // FIXME: This should ideally be put after the LoadStoreVectorizer.
+  // However, due to some annoying facts about ResourceUsageAnalysis,
+  // (especially as exercised in the resource-usage-dead-function test),
+  // we need all the function passes codegenprepare all the way through
+  // said resource usage analysis to run on the call graph produced
+  // before codegenprepare runs (because codegenprepare will knock some
+  // nodes out of the graph, which leads to function-level passes not
+  // being run on them, which causes crashes in the resource usage analysis).
+  addPass(AMDGPULowerBufferFatPointersPass(TM));
+
   Base::addCodeGenPrepare(addPass);
 
+  if (isPassEnabled(EnableLoadStoreVectorizer))
+addPass(LoadStoreVectorizerPass());
+
   // LowerSwitch pass may introduce unreachable blocks that can cause 
unexpected
   // behavior for subsequent passes. Placing it here seems better that these
   // blocks would get cleaned up by UnreachableBlockElim inserted next in the
@@ -106,3 +135,12 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   addPass(SILowerI1CopiesPass());
   return Error::success();
 }
+
+bool AMDGPUCodeGenPassBuilder::isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level) const {
+  if (Opt.getNumOccurrences())
+return Opt;
+  if (TM.getOptLevel() < Level)
+return false;
+  return Opt;
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
index efb296689bd647..1ff7744c84a436 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h
@@ -28,6 +28,13 @@ class AMDGPUCodeGenPassBuilder
   void addPreISel(AddIRPass &addPass) const;
   void addAsmPrinter(AddMachinePass &, CreateMCStreamer) const;
   Error addInstSelector(AddMachinePass &) const;
+
+  /// Check if a pass is enabled given \p Opt option. The option always
+  /// overrides defaults if explicitly used. Otherwise its default will
+  /// be used given that a pass shall work at an optimization \p Level
+  /// minimum.
+  bool isPassEnabled(const cl::opt &Opt,
+ CodeGenOptLevel Level = CodeGenOptLevel::Default) const;
 };
 
 } // namespace llvm

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm created 
https://github.com/llvm/llvm-project/pull/102884

This is not complete, but gets AtomicExpand running. I was able
to get further than I expected; we're quite close to having all
the IR codegen passes ported.

>From 185d4210d77de0c1db775b4914d3b2e1077dea68 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Mon, 12 Aug 2024 15:26:25 +0400
Subject: [PATCH] AMDGPU/NewPM: Start filling out addIRPasses

This is not complete, but gets AtomicExpand running. I was able
to get further than I expected; we're quite close to having all
the IR codegen passes ported.
---
 .../AMDGPU/AMDGPUCodeGenPassBuilder.cpp   | 104 ++
 .../Target/AMDGPU/AMDGPUCodeGenPassBuilder.h  |   5 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |   1 +
 3 files changed, 110 insertions(+)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 9fd7e24b114dd..854e1644a71e9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -8,14 +8,24 @@
 
 #include "AMDGPUCodeGenPassBuilder.h"
 #include "AMDGPU.h"
+#include "AMDGPUCtorDtorLowering.h"
 #include "AMDGPUISelDAGToDAG.h"
 #include "AMDGPUPerfHintAnalysis.h"
 #include "AMDGPUTargetMachine.h"
 #include "AMDGPUUnifyDivergentExitNodes.h"
 #include "SIFixSGPRCopies.h"
 #include "llvm/Analysis/UniformityAnalysis.h"
+#include "llvm/Transforms/IPO/AlwaysInliner.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/Transforms/Scalar/EarlyCSE.h"
 #include "llvm/Transforms/Scalar/FlattenCFG.h"
+#include "llvm/Transforms/Scalar/GVN.h"
+#include "llvm/Transforms/Scalar/InferAddressSpaces.h"
+#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"
+#include "llvm/Transforms/Scalar/NaryReassociate.h"
+#include "llvm/Transforms/Scalar/SeparateConstOffsetFromGEP.h"
 #include "llvm/Transforms/Scalar/Sink.h"
+#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/Transforms/Scalar/StructurizeCFG.h"
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/Transforms/Utils/LCSSA.h"
@@ -38,6 +48,70 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
   ShadowStackGCLoweringPass>();
 }
 
+void AMDGPUCodeGenPassBuilder::addIRPasses(AddIRPass &addPass) const {
+  // TODO: Missing AMDGPURemoveIncompatibleFunctions
+
+  addPass(AMDGPUPrintfRuntimeBindingPass());
+  if (LowerCtorDtor)
+addPass(AMDGPUCtorDtorLoweringPass());
+
+  if (isPassEnabled(EnableImageIntrinsicOptimizer))
+addPass(AMDGPUImageIntrinsicOptimizerPass(TM));
+
+  // This can be disabled by passing ::Disable here or on the command line
+  // with --expand-variadics-override=disable.
+  addPass(ExpandVariadicsPass(ExpandVariadicsMode::Lowering));
+
+  addPass(AMDGPUAlwaysInlinePass());
+  addPass(AlwaysInlinerPass());
+
+  // TODO: Missing OpenCLEnqueuedBlockLowering
+
+  // Runs before PromoteAlloca so the latter can account for function uses
+  if (EnableLowerModuleLDS)
+addPass(AMDGPULowerModuleLDSPass(TM));
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None)
+addPass(InferAddressSpacesPass());
+
+  // Run atomic optimizer before Atomic Expand
+  if (TM.getOptLevel() >= CodeGenOptLevel::Less &&
+  (AMDGPUAtomicOptimizerStrategy != ScanOptions::None))
+addPass(AMDGPUAtomicOptimizerPass(TM, AMDGPUAtomicOptimizerStrategy));
+
+  addPass(AtomicExpandPass());
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None) {
+addPass(AMDGPUPromoteAllocaPass(TM));
+if (isPassEnabled(EnableScalarIRPasses))
+  addStraightLineScalarOptimizationPasses(addPass);
+
+// TODO: Handle EnableAMDGPUAliasAnalysis
+
+// TODO: May want to move later or split into an early and late one.
+addPass(AMDGPUCodeGenPreparePass(TM));
+
+// TODO: LICM
+  }
+
+  Base::addIRPasses(addPass);
+
+  // EarlyCSE is not always strong enough to clean up what LSR produces. For
+  // example, GVN can combine
+  //
+  //   %0 = add %a, %b
+  //   %1 = add %b, %a
+  //
+  // and
+  //
+  //   %0 = shl nsw %a, 2
+  //   %1 = shl %a, 2
+  //
+  // but EarlyCSE can do neither of them.
+  if (isPassEnabled(EnableScalarIRPasses))
+addEarlyCSEOrGVNPass(addPass);
+}
+
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
   // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
   // deleted soon.
@@ -136,6 +210,36 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   return Error::success();
 }
 
+void AMDGPUCodeGenPassBuilder::addEarlyCSEOrGVNPass(AddIRPass &addPass) const {
+  if (TM.getOptLevel() == CodeGenOptLevel::Aggressive)
+addPass(GVNPass());
+  else
+addPass(EarlyCSEPass());
+}
+
+void AMDGPUCodeGenPassBuilder::addStraightLineScalarOptimizationPasses(
+AddIRPass &addPass) const {
+  if (isPassEnabled(EnableLoopPrefetch, CodeGenOptLevel::Aggressive))
+addPass(LoopDataPrefetchPass());
+
+  addPass(Separ

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm ready_for_review 
https://github.com/llvm/llvm-project/pull/102884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits


arsenm wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/102884?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#102884** https://app.graphite.dev/github/pr/llvm/llvm-project/102884?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#102867** https://app.graphite.dev/github/pr/llvm/llvm-project/102867?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102865** https://app.graphite.dev/github/pr/llvm/llvm-project/102865?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102816** https://app.graphite.dev/github/pr/llvm/llvm-project/102816?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102815** https://app.graphite.dev/github/pr/llvm/llvm-project/102815?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102814** https://app.graphite.dev/github/pr/llvm/llvm-project/102814?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102812** https://app.graphite.dev/github/pr/llvm/llvm-project/102812?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102806** https://app.graphite.dev/github/pr/llvm/llvm-project/102806?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102805** https://app.graphite.dev/github/pr/llvm/llvm-project/102805?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102645** https://app.graphite.dev/github/pr/llvm/llvm-project/102645?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#102644** https://app.graphite.dev/github/pr/llvm/llvm-project/102644?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @arsenm and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/102884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-llvm-transforms

Author: Matt Arsenault (arsenm)


Changes

This is not complete, but gets AtomicExpand running. I was able
to get further than I expected; we're quite close to having all
the IR codegen passes ported.

---
Full diff: https://github.com/llvm/llvm-project/pull/102884.diff


3 Files Affected:

- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp (+104) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.h (+5) 
- (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+1) 


``diff
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
index 9fd7e24b114dd..854e1644a71e9 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
@@ -8,14 +8,24 @@
 
 #include "AMDGPUCodeGenPassBuilder.h"
 #include "AMDGPU.h"
+#include "AMDGPUCtorDtorLowering.h"
 #include "AMDGPUISelDAGToDAG.h"
 #include "AMDGPUPerfHintAnalysis.h"
 #include "AMDGPUTargetMachine.h"
 #include "AMDGPUUnifyDivergentExitNodes.h"
 #include "SIFixSGPRCopies.h"
 #include "llvm/Analysis/UniformityAnalysis.h"
+#include "llvm/Transforms/IPO/AlwaysInliner.h"
+#include "llvm/Transforms/IPO/ExpandVariadics.h"
+#include "llvm/Transforms/Scalar/EarlyCSE.h"
 #include "llvm/Transforms/Scalar/FlattenCFG.h"
+#include "llvm/Transforms/Scalar/GVN.h"
+#include "llvm/Transforms/Scalar/InferAddressSpaces.h"
+#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"
+#include "llvm/Transforms/Scalar/NaryReassociate.h"
+#include "llvm/Transforms/Scalar/SeparateConstOffsetFromGEP.h"
 #include "llvm/Transforms/Scalar/Sink.h"
+#include "llvm/Transforms/Scalar/StraightLineStrengthReduce.h"
 #include "llvm/Transforms/Scalar/StructurizeCFG.h"
 #include "llvm/Transforms/Utils/FixIrreducible.h"
 #include "llvm/Transforms/Utils/LCSSA.h"
@@ -38,6 +48,70 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
   ShadowStackGCLoweringPass>();
 }
 
+void AMDGPUCodeGenPassBuilder::addIRPasses(AddIRPass &addPass) const {
+  // TODO: Missing AMDGPURemoveIncompatibleFunctions
+
+  addPass(AMDGPUPrintfRuntimeBindingPass());
+  if (LowerCtorDtor)
+addPass(AMDGPUCtorDtorLoweringPass());
+
+  if (isPassEnabled(EnableImageIntrinsicOptimizer))
+addPass(AMDGPUImageIntrinsicOptimizerPass(TM));
+
+  // This can be disabled by passing ::Disable here or on the command line
+  // with --expand-variadics-override=disable.
+  addPass(ExpandVariadicsPass(ExpandVariadicsMode::Lowering));
+
+  addPass(AMDGPUAlwaysInlinePass());
+  addPass(AlwaysInlinerPass());
+
+  // TODO: Missing OpenCLEnqueuedBlockLowering
+
+  // Runs before PromoteAlloca so the latter can account for function uses
+  if (EnableLowerModuleLDS)
+addPass(AMDGPULowerModuleLDSPass(TM));
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None)
+addPass(InferAddressSpacesPass());
+
+  // Run atomic optimizer before Atomic Expand
+  if (TM.getOptLevel() >= CodeGenOptLevel::Less &&
+  (AMDGPUAtomicOptimizerStrategy != ScanOptions::None))
+addPass(AMDGPUAtomicOptimizerPass(TM, AMDGPUAtomicOptimizerStrategy));
+
+  addPass(AtomicExpandPass());
+
+  if (TM.getOptLevel() > CodeGenOptLevel::None) {
+addPass(AMDGPUPromoteAllocaPass(TM));
+if (isPassEnabled(EnableScalarIRPasses))
+  addStraightLineScalarOptimizationPasses(addPass);
+
+// TODO: Handle EnableAMDGPUAliasAnalysis
+
+// TODO: May want to move later or split into an early and late one.
+addPass(AMDGPUCodeGenPreparePass(TM));
+
+// TODO: LICM
+  }
+
+  Base::addIRPasses(addPass);
+
+  // EarlyCSE is not always strong enough to clean up what LSR produces. For
+  // example, GVN can combine
+  //
+  //   %0 = add %a, %b
+  //   %1 = add %b, %a
+  //
+  // and
+  //
+  //   %0 = shl nsw %a, 2
+  //   %1 = shl %a, 2
+  //
+  // but EarlyCSE can do neither of them.
+  if (isPassEnabled(EnableScalarIRPasses))
+addEarlyCSEOrGVNPass(addPass);
+}
+
 void AMDGPUCodeGenPassBuilder::addCodeGenPrepare(AddIRPass &addPass) const {
   // AMDGPUAnnotateKernelFeaturesPass is missing here, but it will hopefully be
   // deleted soon.
@@ -136,6 +210,36 @@ Error 
AMDGPUCodeGenPassBuilder::addInstSelector(AddMachinePass &addPass) const {
   return Error::success();
 }
 
+void AMDGPUCodeGenPassBuilder::addEarlyCSEOrGVNPass(AddIRPass &addPass) const {
+  if (TM.getOptLevel() == CodeGenOptLevel::Aggressive)
+addPass(GVNPass());
+  else
+addPass(EarlyCSEPass());
+}
+
+void AMDGPUCodeGenPassBuilder::addStraightLineScalarOptimizationPasses(
+AddIRPass &addPass) const {
+  if (isPassEnabled(EnableLoopPrefetch, CodeGenOptLevel::Aggressive))
+addPass(LoopDataPrefetchPass());
+
+  addPass(SeparateConstOffsetFromGEPPass());
+
+  // ReassociateGEPs exposes more opportunities for SLSR. See
+  // the example in reassociate-geps-and-slsr.ll.
+  addPass(StraightLineStrengthReducePass());
+
+  // SeparateConstOffsetFromGEP and SLSR creates c

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak edited 
https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak approved this pull request.

Thank you again Akash, this LGTM. I have a couple of minor nits left, but 
there's no need for another review by me after addressing them. Can you add to 
ops.mlir an instance of `distribute parallel do/for simd` after line 117?

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -2383,3 +2383,165 @@ func.func @masked_arg_count_mismatch(%arg0: i32, %arg1: 
i32) {
 }) : (i32, i32) -> ()
   return
 }
+
+// -
+func.func @omp_parallel_missing_composite(%lb: index, %ub: index, %step: 
index) -> () {
+  omp.distribute {
+  // expected-error@+1 {{'omp.composite' attribute missing from composite 
wrapper}}

skatrak wrote:

```suggestion
// expected-error@+1 {{'omp.composite' attribute missing from composite 
wrapper}}
```

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -1748,11 +1754,27 @@ LogicalResult WsloopOp::verify() {
   if (!isWrapper())
 return emitOpError() << "must be a loop wrapper";
 
+  auto wrapper =
+  llvm::dyn_cast_if_present((*this)->getParentOp());
+  bool isCompositeWrapper = wrapper && wrapper.isWrapper() &&
+(!llvm::isa(wrapper) ||
+ llvm::isa(wrapper->getParentOp()));

skatrak wrote:

Same comment for `SimdOp::verify()`.
```suggestion
 
llvm::isa_and_present(wrapper->getParentOp()));
```

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -1748,11 +1754,27 @@ LogicalResult WsloopOp::verify() {
   if (!isWrapper())
 return emitOpError() << "must be a loop wrapper";
 
+  auto wrapper =
+  llvm::dyn_cast_if_present((*this)->getParentOp());
+  bool isCompositeWrapper = wrapper && wrapper.isWrapper() &&

skatrak wrote:

Nit: I think the name could be improved for clarity. Same comment for 
`SimdOp::verify()`.
```suggestion
  bool isCompositeChildLeaf = wrapper && wrapper.isWrapper() &&
```

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -210,7 +210,7 @@ func.func @invalid_nested_wrapper(%lb : index, %ub : index, 
%step : index) {
   omp.terminator
 }

skatrak wrote:

Nit: Add attribute to `omp.distribute` too.

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -36,9 +36,9 @@ func.func @invalid_nested_wrapper(%lb : index, %ub : index, 
%step : index) {
 omp.terminator
   }

skatrak wrote:

Nit: I suppose it doesn't matter because the parent op's verifier fails before 
checking this one, but I think it makes sense to add the attribute to 
`omp.simd` as well here to make sure the only verifier error in this function 
is the one we are checking.

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -2173,7 +2173,7 @@ func.func @omp_distribute_nested_wrapper(%lb: index, %ub: 
index, %step: index) -
   "omp.terminator"() : () -> ()
 }) : () -> ()

skatrak wrote:

Nit: Add attribute to `omp.wsloop` too.

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -1965,7 +1965,7 @@ func.func @taskloop(%lb: i32, %ub: i32, %step: i32) {
   omp.terminator
 }

skatrak wrote:

Nit: Add attribute to `omp.distribute` too.

https://github.com/llvm/llvm-project/pull/102341
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -1064,16 +1064,15 @@ static void printMapClause(OpAsmPrinter &p, Operation 
*op,
 }
 
 static ParseResult parseMembersIndex(OpAsmParser &parser,
- DenseIntElementsAttr &membersIdx) {
-  SmallVector values;
+ ArrayAttr &membersIdx) {
+  SmallVector values, memberIdxs;

skatrak wrote:

Nit: Is there a reason for the magic number "4" here? If not, it's generally 
preferred to leave the default (or zero, if it refuses to give you a 
compile-time default).

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak edited 
https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -2541,6 +2541,31 @@ static void processMapMembersWithParent(
 
 assert(memberDataIdx >= 0 && "could not find mapped member of structure");
 
+// If we're currently mapping a pointer to a block of data, we must
+// initially map the pointer, and then attatch/bind the data with a
+// subsequent map to the pointer, this segment of code generates the
+// pointer mapping. This pointer map can in certain cases be optimised
+// out as Clang currently does in its lowering, however, for the moment
+// we do not do so, in part as we have substantially less information on
+// the data being mapped at this stage; at least for the moment.

skatrak wrote:

```suggestion
// we do not do so, in part as we currently have substantially less 
information
// on the data being mapped at this stage.
```

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak approved this pull request.

Thank you Andrew for working on my previous comments. I have a couple of 
suggestions to hopefully help simplify things a bit further, but it LGTM. No 
need to wait for another look by me before merging.

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -2261,47 +2261,47 @@ static int getMapDataMemberIdx(MapInfoData &mapData,
 
 static mlir::omp::MapInfoOp
 getFirstOrLastMappedMemberPtr(mlir::omp::MapInfoOp mapInfo, bool first) {
-  mlir::DenseIntElementsAttr indexAttr = mapInfo.getMembersIndexAttr();
-
+  mlir::ArrayAttr indexAttr = mapInfo.getMembersIndexAttr();
   // Only 1 member has been mapped, we can return it.
   if (indexAttr.size() == 1)
 if (auto mapOp = mlir::dyn_cast(
 mapInfo.getMembers()[0].getDefiningOp()))
   return mapOp;
 
-  llvm::ArrayRef shape = indexAttr.getShapedType().getShape();
-  llvm::SmallVector indices(shape[0]);
+  llvm::SmallVector indices(indexAttr.size());
   std::iota(indices.begin(), indices.end(), 0);
 
-  llvm::sort(indices.begin(), indices.end(),
- [&](const size_t a, const size_t b) {
-   auto indexValues = indexAttr.getValues();
-   for (int i = 0; i < shape[1]; ++i) {
- int aIndex = indexValues[a * shape[1] + i];
- int bIndex = indexValues[b * shape[1] + i];
-
- if (aIndex == bIndex)
-   continue;
-
- if (aIndex != -1 && bIndex == -1)
-   return false;
-
- if (aIndex == -1 && bIndex != -1)
-   return true;
+  llvm::sort(
+  indices.begin(), indices.end(), [&](const size_t a, const size_t b) {
+auto memberIndicesA = mlir::cast(indexAttr[a]);
+auto memberIndicesB = mlir::cast(indexAttr[b]);
+
+size_t smallestMember = memberIndicesA.size() < memberIndicesB.size()
+? memberIndicesA.size()
+: memberIndicesB.size();
+for (size_t i = 0; i < smallestMember; ++i) {

skatrak wrote:

Nit: I think `llvm::zip` could simplify this a bit, since it already implements 
iterating over two ranges of potentially different sizes until the end of the 
shortest one is reached.

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -2541,6 +2541,31 @@ static void processMapMembersWithParent(
 
 assert(memberDataIdx >= 0 && "could not find mapped member of structure");
 
+// If we're currently mapping a pointer to a block of data, we must
+// initially map the pointer, and then attatch/bind the data with a
+// subsequent map to the pointer, this segment of code generates the
+// pointer mapping. This pointer map can in certain cases be optimised
+// out as Clang currently does in its lowering, however, for the moment

skatrak wrote:

```suggestion
// subsequent map to the pointer. This segment of code generates the
// pointer mapping, which can in certain cases be optimised out as Clang
// currently does in its lowering. However, for the moment
```

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [OpenMP][MLIR] Descriptor explicit member map lowering changes (PR #96265)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -1087,50 +1086,31 @@ static ParseResult parseMembersIndex(OpAsmParser 
&parser,
 if (failed(parser.parseRSquare()))
   return failure();
 
-// Only set once, if any indices are not the same size
-// we error out in the next check as that's unsupported
-if (shape[1] == 0)
-  shape[1] = shapeTmp;
-
-// Verify that the recently parsed list is equal to the
-// first one we parsed, they must be equal lengths to
-// keep the rectangular shape DenseIntElementsAttr
-// requires
-if (shapeTmp != shape[1])
-  return failure();
-
-shapeTmp = 0;
-shape[0]++;
+memberIdxs.push_back(ArrayAttr::get(parser.getContext(), values));
+values.clear();
   } while (succeeded(parser.parseOptionalComma()));
 
-  if (!values.empty()) {
-ShapedType valueType =
-VectorType::get(shape, IntegerType::get(parser.getContext(), 32));
-membersIdx = DenseIntElementsAttr::get(valueType, values);
-  }
+  if (!memberIdxs.empty())
+membersIdx = ArrayAttr::get(parser.getContext(), memberIdxs);
 
   return success();
 }
 
 static void printMembersIndex(OpAsmPrinter &p, MapInfoOp op,
-  DenseIntElementsAttr membersIdx) {
-  llvm::ArrayRef shape = membersIdx.getShapedType().getShape();
-  assert(shape.size() <= 2);
-
+  ArrayAttr membersIdx) {
   if (!membersIdx)
 return;
 
-  for (int i = 0; i < shape[0]; ++i) {
+  for (size_t i = 0; i < membersIdx.getValue().size(); i++) {

skatrak wrote:

Nit: I think it would be useful to use `llvm::join` for the inner loop to 
improve readability. Then, using `llvm::enumerate` for the outer loop would 
probably help as well.

https://github.com/llvm/llvm-project/pull/96265
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread via llvm-branch-commits


paperchalice wrote:

+1 for this if backend developers want to test something, although I'm still 
trying to improve the pass builder in #89708 after investigating asm printer 
and register allocator.

https://github.com/llvm/llvm-project/pull/102884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/19.x: [lldb] Fix crash when adding members to an "incomplete" type (#102116) (PR #102895)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102895

Backport 57cd100

Requested by: @labath

>From 5cb7662e5e808dd733e623294c4376f57611723b Mon Sep 17 00:00:00 2001
From: Pavel Labath 
Date: Thu, 8 Aug 2024 10:53:15 +0200
Subject: [PATCH] [lldb] Fix crash when adding members to an "incomplete" type
 (#102116)

This fixes a regression caused by delayed type definition searching
(#96755 and friends): If we end up adding a member (e.g. a typedef) to a
type that we've already attempted to complete (and failed), the
resulting AST would end up inconsistent (we would start to "forcibly"
complete it, but never finish it), and importing it into an expression
AST would crash.

This patch fixes this by detecting the situation and finishing the
definition as well.

(cherry picked from commit 57cd1000c9c93fd0e64352cfbc9fbbe5b8a8fcef)
---
 .../SymbolFile/DWARF/DWARFASTParserClang.cpp  | 11 +++--
 .../DWARF/x86/typedef-in-incomplete-type.cpp  | 23 +++
 2 files changed, 32 insertions(+), 2 deletions(-)
 create mode 100644 
lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp 
b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 85c59a605c675c..ac769ad9fbd52c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -269,8 +269,15 @@ static void PrepareContextToReceiveMembers(TypeSystemClang 
&ast,
   }
 
   // We don't have a type definition and/or the import failed, but we need to
-  // add members to it. Start the definition to make that possible.
-  tag_decl_ctx->startDefinition();
+  // add members to it. Start the definition to make that possible. If the type
+  // has no external storage we also have to complete the definition. 
Otherwise,
+  // that will happen when we are asked to complete the type
+  // (CompleteTypeFromDWARF).
+  ast.StartTagDeclarationDefinition(type);
+  if (!tag_decl_ctx->hasExternalLexicalStorage()) {
+ast.SetDeclIsForcefullyCompleted(tag_decl_ctx);
+ast.CompleteTagDeclarationDefinition(type);
+  }
 }
 
 ParsedDWARFTypeAttributes::ParsedDWARFTypeAttributes(const DWARFDIE &die) {
diff --git 
a/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp 
b/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp
new file mode 100644
index 00..591607784b0a9b
--- /dev/null
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp
@@ -0,0 +1,23 @@
+// RUN: %clangxx --target=x86_64-pc-linux -flimit-debug-info -o %t -c %s -g
+// RUN: %lldb %t -o "target var a" -o "expr -- var" -o exit | FileCheck %s
+
+// This forces lldb to attempt to complete the type A. Since it has no
+// definition it will fail.
+// CHECK: target var a
+// CHECK: (A) a = 
+
+// Now attempt to display the second variable, which will try to add a typedef
+// to the incomplete type. Make sure that succeeds. Use the expression command
+// to make sure the resulting AST can be imported correctly.
+// CHECK: expr -- var
+// CHECK: (A::X) $0 = 0
+
+struct A {
+  // Declare the constructor, but don't define it to avoid emitting the
+  // definition in the debug info.
+  A();
+  using X = int;
+};
+
+A a;
+A::X var;

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/19.x: [lldb] Fix crash when adding members to an "incomplete" type (#102116) (PR #102895)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102895
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/19.x: [lldb] Fix crash when adding members to an "incomplete" type (#102116) (PR #102895)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:

@Michael137 What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102895
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/19.x: [lldb] Fix crash when adding members to an "incomplete" type (#102116) (PR #102895)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-lldb

Author: None (llvmbot)


Changes

Backport 57cd100

Requested by: @labath

---
Full diff: https://github.com/llvm/llvm-project/pull/102895.diff


2 Files Affected:

- (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp 
(+9-2) 
- (added) lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp 
(+23) 


``diff
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp 
b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 85c59a605c675c..ac769ad9fbd52c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -269,8 +269,15 @@ static void PrepareContextToReceiveMembers(TypeSystemClang 
&ast,
   }
 
   // We don't have a type definition and/or the import failed, but we need to
-  // add members to it. Start the definition to make that possible.
-  tag_decl_ctx->startDefinition();
+  // add members to it. Start the definition to make that possible. If the type
+  // has no external storage we also have to complete the definition. 
Otherwise,
+  // that will happen when we are asked to complete the type
+  // (CompleteTypeFromDWARF).
+  ast.StartTagDeclarationDefinition(type);
+  if (!tag_decl_ctx->hasExternalLexicalStorage()) {
+ast.SetDeclIsForcefullyCompleted(tag_decl_ctx);
+ast.CompleteTagDeclarationDefinition(type);
+  }
 }
 
 ParsedDWARFTypeAttributes::ParsedDWARFTypeAttributes(const DWARFDIE &die) {
diff --git 
a/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp 
b/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp
new file mode 100644
index 00..591607784b0a9b
--- /dev/null
+++ b/lldb/test/Shell/SymbolFile/DWARF/x86/typedef-in-incomplete-type.cpp
@@ -0,0 +1,23 @@
+// RUN: %clangxx --target=x86_64-pc-linux -flimit-debug-info -o %t -c %s -g
+// RUN: %lldb %t -o "target var a" -o "expr -- var" -o exit | FileCheck %s
+
+// This forces lldb to attempt to complete the type A. Since it has no
+// definition it will fail.
+// CHECK: target var a
+// CHECK: (A) a = 
+
+// Now attempt to display the second variable, which will try to add a typedef
+// to the incomplete type. Make sure that succeeds. Use the expression command
+// to make sure the resulting AST can be imported correctly.
+// CHECK: expr -- var
+// CHECK: (A::X) $0 = 0
+
+struct A {
+  // Declare the constructor, but don't define it to avoid emitting the
+  // definition in the debug info.
+  A();
+  using X = int;
+};
+
+A a;
+A::X var;

``




https://github.com/llvm/llvm-project/pull/102895
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [lldb] release/19.x: [lldb] Fix crash when adding members to an "incomplete" type (#102116) (PR #102895)

2024-08-12 Thread Michael Buch via llvm-branch-commits


https://github.com/Michael137 approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/102895
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] AMDGPU/NewPM: Start filling out addIRPasses (PR #102884)

2024-08-12 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> +1 for this if backend developers want to test something, although I'm still 
> trying to improve the pass builder in #89708 after investigating asm printer 
> and register allocator.

I'm currently blocked by assorted -enable-new-pm tests failing in future 
patches because of missing pass runs in the new pipeline. Whatever ends up here 
can change with whatever refactoring happens 

https://github.com/llvm/llvm-project/pull/102884
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Statically allocate storage for decoded pseudo probes and function records (PR #102789)

2024-08-12 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov edited 
https://github.com/llvm/llvm-project/pull/102789
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Statically allocate storage for decoded pseudo probes and function records (PR #102789)

2024-08-12 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/102789


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Statically allocate storage for decoded pseudo probes and function records (PR #102789)

2024-08-12 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov updated 
https://github.com/llvm/llvm-project/pull/102789


___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Reduce Address2ProbesMap size (PR #102904)

2024-08-12 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/102904

Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Reduce Address2ProbesMap size (PR #102904)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-bolt

@llvm/pr-subscribers-pgo

Author: Amir Ayupov (aaupov)


Changes

Replace the map from addresses to list of probes with a flat vector
containing probe references sorted by their addresses.

Reduces pseudo probe parsing time from 9.56s to 8.59s and peak RSS from
9.66 GiB to 9.08 GiB as part of perf2bolt processing a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```


---
Full diff: https://github.com/llvm/llvm-project/pull/102904.diff


6 Files Affected:

- (modified) bolt/lib/Profile/DataAggregator.cpp (+6-8) 
- (modified) bolt/lib/Profile/YAMLProfileWriter.cpp (+4-7) 
- (modified) bolt/lib/Rewrite/PseudoProbeRewriter.cpp (+34-49) 
- (modified) llvm/include/llvm/MC/MCPseudoProbe.h (+19-5) 
- (modified) llvm/lib/MC/MCPseudoProbe.cpp (+22-21) 
- (modified) llvm/tools/llvm-profgen/ProfileGenerator.cpp (+3-5) 


``diff
diff --git a/bolt/lib/Profile/DataAggregator.cpp 
b/bolt/lib/Profile/DataAggregator.cpp
index a300e5b2b1dabd..813d825f8b570c 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -2415,17 +2415,15 @@ std::error_code 
DataAggregator::writeBATYAML(BinaryContext &BC,
 Fragments.insert(BF);
 for (const BinaryFunction *F : Fragments) {
   const uint64_t FuncAddr = F->getAddress();
-  const auto &FragmentProbes =
-  llvm::make_range(ProbeMap.lower_bound(FuncAddr),
-   ProbeMap.lower_bound(FuncAddr + F->getSize()));
-  for (const auto &[OutputAddress, Probes] : FragmentProbes) {
+  for (const MCDecodedPseudoProbe &Probe :
+   ProbeMap.find(FuncAddr, FuncAddr + F->getSize())) {
+const uint32_t OutputAddress = Probe.getAddress();
 const uint32_t InputOffset = BAT->translate(
 FuncAddr, OutputAddress - FuncAddr, /*IsBranchSrc=*/true);
 const unsigned BlockIndex = getBlock(InputOffset).second;
-for (const MCDecodedPseudoProbe &Probe : Probes)
-  YamlBF.Blocks[BlockIndex].PseudoProbes.emplace_back(
-  yaml::bolt::PseudoProbeInfo{Probe.getGuid(), 
Probe.getIndex(),
-  Probe.getType()});
+YamlBF.Blocks[BlockIndex].PseudoProbes.emplace_back(
+yaml::bolt::PseudoProbeInfo{Probe.getGuid(), Probe.getIndex(),
+Probe.getType()});
   }
 }
   }
diff --git a/bolt/lib/Profile/YAMLProfileWriter.cpp 
b/bolt/lib/Profile/YAMLProfileWriter.cpp
index 8441d611a3..f74cf60e076d0a 100644
--- a/bolt/lib/Profile/YAMLProfileWriter.cpp
+++ b/bolt/lib/Profile/YAMLProfileWriter.cpp
@@ -193,13 +193,10 @@ YAMLProfileWriter::convert(const BinaryFunction &BF, bool 
UseDFS,
   const uint64_t FuncAddr = BF.getAddress();
   const std::pair &BlockRange =
   BB->getInputAddressRange();
-  const auto &BlockProbes =
-  llvm::make_range(ProbeMap.lower_bound(FuncAddr + BlockRange.first),
-   ProbeMap.lower_bound(FuncAddr + BlockRange.second));
-  for (const auto &[_, Probes] : BlockProbes)
-for (const MCDecodedPseudoProbe &Probe : Probes)
-  YamlBB.PseudoProbes.emplace_back(yaml::bolt::PseudoProbeInfo{
-  Probe.getGuid(), Probe.getIndex(), Probe.getType()});
+  for (const MCDecodedPseudoProbe &Probe : ProbeMap.find(
+   FuncAddr + BlockRange.first, FuncAddr + BlockRange.second))
+YamlBB.PseudoProbes.emplace_back(yaml::bolt::PseudoProbeInfo{
+Probe.getGuid(), Probe.getIndex(), Probe.getType()});
 }
 
 YamlBF.Blocks.emplace_back(YamlBB);
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index d99e4280b0f099..95a0d2c1fbe594 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -173,13 +173,13 @@ void PseudoProbeRewriter::updatePseudoProbes() {
   AddressProbesMap &Address2ProbesMap = ProbeDecoder.getAddress2ProbesMap();
   const GUIDProbeFunctionMap &GUID2Func = ProbeDecoder.getGUID2FuncDescMap();
 
-  for (auto &AP : Address2ProbesMap) {
-BinaryFunction *F = BC.getBinaryFunctionContainingAddress(AP.first);
+  for (MCDecodedPseudoProbe &Probe : Address2ProbesMap) {
+uint64_t Address = Probe.getAddress();
+BinaryFunction *F = BC.getBinaryFunctionContainingAddress(Address);
 // If F is removed, eliminate all probes inside it from inline tree
 // Setting probes' addresses as INT64_MAX means elimination
 if (!F) {
-  for (MCDecodedPseudoProbe &Probe : AP.second)
-Probe.setAddress(INT64_MAX);
+  Probe.setAddress(INT64_MAX);
   continue;
 }
 // If F is not emitted, the function will remain in the same address as its
@@ -187,45 +187,36 @@ void PseudoProbeRewriter::updatePseudoProbes() {
 if (!F->isEmitted())
   continue;
 
-uint64_

[llvm-branch-commits] [MC][NFC] Use vector for GUIDProbeFunctionMap (PR #102905)

2024-08-12 Thread Amir Ayupov via llvm-branch-commits


https://github.com/aaupov created 
https://github.com/llvm/llvm-project/pull/102905

Replace unordered_map with a vector. Pre-parse the section to statically
allocate storage. Use BumpPtrAllocator for FuncName strings, keep
StringRef in FuncDesc.

Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as
part of perf2bolt with a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [MC][NFC] Use vector for GUIDProbeFunctionMap (PR #102905)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:



@llvm/pr-subscribers-mc

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)


Changes

Replace unordered_map with a vector. Pre-parse the section to statically
allocate storage. Use BumpPtrAllocator for FuncName strings, keep
StringRef in FuncDesc.

Reduces peak RSS of pseudo probe parsing from 9.08 GiB to 8.89 GiB as
part of perf2bolt with a large binary.

Test Plan:
```
bin/llvm-lit -sv test/tools/llvm-profgen
```


---
Full diff: https://github.com/llvm/llvm-project/pull/102905.diff


3 Files Affected:

- (modified) bolt/lib/Rewrite/PseudoProbeRewriter.cpp (+2-1) 
- (modified) llvm/include/llvm/MC/MCPseudoProbe.h (+13-3) 
- (modified) llvm/lib/MC/MCPseudoProbe.cpp (+26-21) 


``diff
diff --git a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp 
b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
index 95a0d2c1fbe59..77605f1b47b11 100644
--- a/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
+++ b/bolt/lib/Rewrite/PseudoProbeRewriter.cpp
@@ -155,7 +155,8 @@ void PseudoProbeRewriter::parsePseudoProbe() {
 ProbeDecoder.printProbesForAllAddresses(outs());
   }
 
-  for (const auto &[GUID, FuncDesc] : ProbeDecoder.getGUID2FuncDescMap()) {
+  for (const auto &FuncDesc : ProbeDecoder.getGUID2FuncDescMap()) {
+uint64_t GUID = FuncDesc.FuncGUID;
 if (!FuncStartAddrs.contains(GUID))
   continue;
 BinaryFunction *BF = BC.getBinaryFunctionAtAddress(FuncStartAddrs[GUID]);
diff --git a/llvm/include/llvm/MC/MCPseudoProbe.h 
b/llvm/include/llvm/MC/MCPseudoProbe.h
index 6021dd38e9d9c..64b73b5b932e9 100644
--- a/llvm/include/llvm/MC/MCPseudoProbe.h
+++ b/llvm/include/llvm/MC/MCPseudoProbe.h
@@ -61,6 +61,7 @@
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/IR/PseudoProbe.h"
+#include "llvm/Support/Allocator.h"
 #include "llvm/Support/ErrorOr.h"
 #include 
 #include 
@@ -86,7 +87,7 @@ enum class MCPseudoProbeFlag {
 struct MCPseudoProbeFuncDesc {
   uint64_t FuncGUID = 0;
   uint64_t FuncHash = 0;
-  std::string FuncName;
+  StringRef FuncName;
 
   MCPseudoProbeFuncDesc(uint64_t GUID, uint64_t Hash, StringRef Name)
   : FuncGUID(GUID), FuncHash(Hash), FuncName(Name){};
@@ -100,8 +101,15 @@ class MCDecodedPseudoProbe;
 using InlineSite = std::tuple;
 using MCPseudoProbeInlineStack = SmallVector;
 // GUID to PseudoProbeFuncDesc map
-using GUIDProbeFunctionMap =
-std::unordered_map;
+class GUIDProbeFunctionMap : public std::vector {
+public:
+  auto find(uint64_t GUID) const {
+return llvm::lower_bound(
+*this, GUID, [](const MCPseudoProbeFuncDesc &Desc, uint64_t GUID) {
+  return Desc.FuncGUID < GUID;
+});
+  }
+};
 
 class MCDecodedPseudoProbeInlineTree;
 
@@ -382,6 +390,8 @@ class MCPseudoProbeDecoder {
   // GUID to PseudoProbeFuncDesc map.
   GUIDProbeFunctionMap GUID2FuncDescMap;
 
+  BumpPtrAllocator FuncNameAllocator;
+
   // Address to probes map.
   AddressProbesMap Address2ProbesMap;
 
diff --git a/llvm/lib/MC/MCPseudoProbe.cpp b/llvm/lib/MC/MCPseudoProbe.cpp
index 45fe95e176ff2..e5b66200b4d2e 100644
--- a/llvm/lib/MC/MCPseudoProbe.cpp
+++ b/llvm/lib/MC/MCPseudoProbe.cpp
@@ -274,7 +274,7 @@ static StringRef getProbeFNameForGUID(const 
GUIDProbeFunctionMap &GUID2FuncMAP,
   auto It = GUID2FuncMAP.find(GUID);
   assert(It != GUID2FuncMAP.end() &&
  "Probe function must exist for a valid GUID");
-  return It->second.FuncName;
+  return It->FuncName;
 }
 
 void MCPseudoProbeFuncDesc::print(raw_ostream &OS) {
@@ -390,32 +390,41 @@ bool MCPseudoProbeDecoder::buildGUID2FuncDescMap(const 
uint8_t *Start,
   Data = Start;
   End = Data + Size;
 
+  uint32_t FuncDescCount = 0;
   while (Data < End) {
-auto ErrorOrGUID = readUnencodedNumber();
-if (!ErrorOrGUID)
+if (!readUnencodedNumber())
   return false;
-
-auto ErrorOrHash = readUnencodedNumber();
-if (!ErrorOrHash)
+if (!readUnencodedNumber())
   return false;
 
 auto ErrorOrNameSize = readUnsignedNumber();
 if (!ErrorOrNameSize)
   return false;
-uint32_t NameSize = std::move(*ErrorOrNameSize);
-
-auto ErrorOrName = readString(NameSize);
-if (!ErrorOrName)
+if (!readString(*ErrorOrNameSize))
   return false;
+++FuncDescCount;
+  }
+  assert(Data == End && "Have unprocessed data in pseudo_probe_desc section");
+  GUID2FuncDescMap.reserve(FuncDescCount);
 
-uint64_t GUID = std::move(*ErrorOrGUID);
-uint64_t Hash = std::move(*ErrorOrHash);
-StringRef Name = std::move(*ErrorOrName);
+  Data = Start;
+  End = Data + Size;
+  while (Data < End) {
+uint64_t GUID =
+cantFail(errorOrToExpected(readUnencodedNumber()));
+uint64_t Hash =
+cantFail(errorOrToExpected(readUnencodedNumber()));
+uint32_t NameSize =
+cantFail(errorOrToExpected(readUnsignedNumber()));
+StringRef Name = cantFail(errorOrToExpected(readString(NameSize)));
 
 // Initialize PseudoProbeFuncDesc and populate it into GUID2FuncDescMap
-GUID2FuncDescMap.emplace(GUID, MCPseudoProbeFuncDe

[llvm-branch-commits] [flang] [mlir] [OpenMP][MLIR] Set omp.composite attr for composite loop wrappers and add verifier checks (PR #102341)

2024-08-12 Thread Akash Banerjee via llvm-branch-commits


https://github.com/TIFitis updated 
https://github.com/llvm/llvm-project/pull/102341

>From ba8cf358a98cfcce6b1239ac88391bc25bcc7197 Mon Sep 17 00:00:00 2001
From: Akash Banerjee 
Date: Wed, 7 Aug 2024 18:31:08 +0100
Subject: [PATCH 1/3] [OpenMP][MLIR] Set omp.composite attr for composite loop
 wrappers and add verifier checks

This patch sets the omp.composite unit attr for composite wrapper ops and also 
add appropriate checks to the verifiers of supported ops for the 
presence/absence of the attribute.
---
 flang/lib/Lower/OpenMP/OpenMP.cpp|  8 +
 mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp | 32 
 2 files changed, 40 insertions(+)

diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp 
b/flang/lib/Lower/OpenMP/OpenMP.cpp
index bbde77c14f36a1..3b18e7b3ecf80e 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -2063,10 +2063,14 @@ static void genCompositeDistributeSimd(
   // TODO: Populate entry block arguments with private variables.
   auto distributeOp = genWrapperOp(
   converter, loc, distributeClauseOps, /*blockArgTypes=*/{});
+  llvm::cast(distributeOp.getOperation())
+  .setComposite(/*val=*/true);
 
   // TODO: Populate entry block arguments with reduction and private variables.
   auto simdOp = genWrapperOp(converter, loc, simdClauseOps,
 /*blockArgTypes=*/{});
+  llvm::cast(simdOp.getOperation())
+  .setComposite(/*val=*/true);
 
   // Construct wrapper entry block list and associated symbols. It is important
   // that the symbol order and the block argument order match, so that the
@@ -2111,10 +2115,14 @@ static void genCompositeDoSimd(lower::AbstractConverter 
&converter,
   // TODO: Add private variables to entry block arguments.
   auto wsloopOp = genWrapperOp(
   converter, loc, wsloopClauseOps, wsloopReductionTypes);
+  llvm::cast(wsloopOp.getOperation())
+  .setComposite(/*val=*/true);
 
   // TODO: Populate entry block arguments with reduction and private variables.
   auto simdOp = genWrapperOp(converter, loc, simdClauseOps,
 /*blockArgTypes=*/{});
+  llvm::cast(simdOp.getOperation())
+  .setComposite(/*val=*/true);
 
   // Construct wrapper entry block list and associated symbols. It is important
   // that the symbol and block argument order match, so that the symbol-value
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp 
b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 11780f84697b15..641fbb5a1418f6 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1546,6 +1546,9 @@ LogicalResult ParallelOp::verify() {
 if (!isWrapper())
   return emitOpError() << "must take a loop wrapper role if nested inside "
   "of 'omp.distribute'";
+if (!llvm::cast(getOperation()).isComposite())
+  return emitError()
+ << "'omp.composite' attribute missing from composite wrapper";
 
 if (LoopWrapperInterface nested = getNestedWrapper()) {
   // Check for the allowed leaf constructs that may appear in a composite
@@ -1555,6 +1558,9 @@ LogicalResult ParallelOp::verify() {
 } else {
   return emitOpError() << "must not wrap an 'omp.loop_nest' directly";
 }
+  } else if (llvm::cast(getOperation()).isComposite()) {
+return emitError()
+   << "'omp.composite' attribute present in non-composite wrapper";
   }
 
   if (getAllocateVars().size() != getAllocatorVars().size())
@@ -1749,10 +1755,18 @@ LogicalResult WsloopOp::verify() {
 return emitOpError() << "must be a loop wrapper";
 
   if (LoopWrapperInterface nested = getNestedWrapper()) {
+if (!llvm::cast(getOperation()).isComposite())
+  return emitError()
+ << "'omp.composite' attribute missing from composite wrapper";
+
 // Check for the allowed leaf constructs that may appear in a composite
 // construct directly after DO/FOR.
 if (!isa(nested))
   return emitError() << "only supported nested wrapper is 'omp.simd'";
+
+  } else if (llvm::cast(getOperation()).isComposite()) {
+return emitError()
+   << "'omp.composite' attribute present in non-composite wrapper";
   }
 
   return verifyReductionVarList(*this, getReductionSyms(), getReductionVars(),
@@ -1796,6 +1810,10 @@ LogicalResult SimdOp::verify() {
   if (getNestedWrapper())
 return emitOpError() << "must wrap an 'omp.loop_nest' directly";
 
+  if (llvm::cast(getOperation()).isComposite())
+return emitError()
+   << "'omp.composite' attribute present in non-composite wrapper";
+
   return success();
 }
 
@@ -1825,11 +1843,17 @@ LogicalResult DistributeOp::verify() {
 return emitOpError() << "must be a loop wrapper";
 
   if (LoopWrapperInterface nested = getNestedWrapper()) {
+if (!llvm::cast(getOperation()).isComposite())
+  return emitError()
+ << "'omp.composite' at

[llvm-branch-commits] [flang] [flang] Lower omp.workshare to other omp constructs (PR #101446)

2024-08-12 Thread via llvm-branch-commits



@@ -344,6 +345,7 @@ inline void createHLFIRToFIRPassPipeline(
   pm.addPass(hlfir::createLowerHLFIRIntrinsics());
   pm.addPass(hlfir::createBufferizeHLFIR());
   pm.addPass(hlfir::createConvertHLFIRtoFIR());
+  pm.addPass(flangomp::createLowerWorkshare());

agozillon wrote:

Sorry, just seen this ping! The comment is primarily to state that the passes 
should be ran immediately after lowering from parse tree to IR (HLFIR/FIR/OMP), 
as they make a lot of changes to convert things into a more final form for the 
OMP dialect with respect to target. It was previously a lot more important as 
we had a form of outlining that ripped out target regions from their functions 
into seperate functions. That's no longer there, but we do still have some 
passes that modify the IR at this stage to a more finalized form for target, in 
particular OMPMapInfoFinalization which will generate some new maps for 
descriptor types, OMPMarkDeclareTarget which will mark functions declare target 
implicitly, and another that removes functions unnecessary for device. There is 
also a pass or will be for do concurrent which I believe outlines loops into 
target regions as well. 

But TL;DR, there's a lot of things going on in those passes that would be 
preferable to keep happening immediately after lowering from the parse tree so 
later passes can depend on the information being in the "correct" format, 
whether or not that "immediate" location has changed to after this HLFIR 
lowering or remains where it is currently I am unsure of! 

@skatrak @jsjodin may also have some feedback/input to this.

https://github.com/llvm/llvm-project/pull/101446
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak edited 
https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -267,6 +267,24 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
   return getEntryBlock();
 }
 
+static mlir::ArrayAttr makeI64ArrayAttr(llvm::ArrayRef values,
+mlir::MLIRContext *context) {
+  llvm::SmallVector attrs;
+  for (auto &v : values)
+attrs.push_back(mlir::IntegerAttr::get(mlir::IntegerType::get(context, 64),
+   mlir::APInt(64, v)));
+  return mlir::ArrayAttr::get(context, attrs);
+}
+
+mlir::ArrayAttr fir::FirOpBuilder::create2DIntegerArrayAttr(

skatrak wrote:

```suggestion
mlir::ArrayAttr fir::FirOpBuilder::create2DI64ArrayAttr(
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -175,99 +271,63 @@ getComponentObject(std::optional object,
   return getComponentObject(baseObj.value(), semaCtx);
 }
 
-static void
-generateMemberPlacementIndices(const Object &object,
-   llvm::SmallVectorImpl &indices,
-   semantics::SemanticsContext &semaCtx) {
+void generateMemberPlacementIndices(const Object &object,
+llvm::SmallVectorImpl &indices,
+semantics::SemanticsContext &semaCtx) {

skatrak wrote:

There's an implicit assumption here that `indices` is empty (it reverses it at 
the end of the function, so two calls to this function on the same vector would 
result in some unexpected order), maybe add an assert for `indices.empty()` or 
call `indices.clear()` before filling it.

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -267,6 +267,24 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
   return getEntryBlock();
 }
 
+static mlir::ArrayAttr makeI64ArrayAttr(llvm::ArrayRef values,
+mlir::MLIRContext *context) {
+  llvm::SmallVector attrs;

skatrak wrote:

```suggestion
  llvm::SmallVector attrs;
  attrs.reserve(values.size());
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -175,99 +271,63 @@ getComponentObject(std::optional object,
   return getComponentObject(baseObj.value(), semaCtx);
 }
 
-static void
-generateMemberPlacementIndices(const Object &object,
-   llvm::SmallVectorImpl &indices,
-   semantics::SemanticsContext &semaCtx) {
+void generateMemberPlacementIndices(const Object &object,
+llvm::SmallVectorImpl &indices,
+semantics::SemanticsContext &semaCtx) {
   auto compObj = getComponentObject(object, semaCtx);
   while (compObj) {
-indices.push_back(getComponentPlacementInParent(compObj->sym()));
+int64_t index = getComponentPlacementInParent(compObj->sym());
+assert(index >= 0);
+indices.push_back(index);
 compObj =
 getComponentObject(getBaseObject(compObj.value(), semaCtx), semaCtx);
   }
 
-  indices = llvm::SmallVector{llvm::reverse(indices)};
+  indices = llvm::SmallVector{llvm::reverse(indices)};

skatrak wrote:

Nit: How about doing the reversal in place to avoid allocations? Some 
combination of `llvm::zip_equal(indices, llvm::reverse(indices))` and 
`std::swap` perhaps?

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -280,75 +340,60 @@ void insertChildMapInfoIntoParent(
   // precedes the children. An alternative, may be to do
   // delayed generation of map info operations from the clauses and
   // organize them first before generation.
-  mapOp->moveAfter(indices.second.back().memberMap);
+  mapOp->moveAfter(indices.second.memberMap.back());
 
-  for (auto memberIndicesData : indices.second)
-mapOp.getMembersMutable().append(
-memberIndicesData.memberMap.getResult());
+  for (mlir::omp::MapInfoOp memberMap : indices.second.memberMap)
+mapOp.getMembersMutable().append(memberMap.getResult());
 
-  mapOp.setMembersIndexAttr(createDenseElementsAttrFromIndices(
-  indices.second, converter.getFirOpBuilder()));
+  mapOp.setMembersIndexAttr(
+  converter.getFirOpBuilder().create2DIntegerArrayAttr(
+  indices.second.memberPlacementIndices));
 } else {
   // NOTE: We take the map type of the first child, this may not
   // be the correct thing to do, however, we shall see. For the moment
   // it allows this to work with enter and exit without causing MLIR
   // verification issues. The more appropriate thing may be to take
   // the "main" map type clause from the directive being used.
-  uint64_t mapType = indices.second[0].memberMap.getMapType().value_or(0);
-
-  // create parent to emplace and bind members
-  mlir::Value origSymbol = converter.getSymbolAddress(*indices.first);
+  uint64_t mapType = indices.second.memberMap[0].getMapType().value_or(0);
 
   llvm::SmallVector members;
-  for (OmpMapMemberIndicesData memberIndicesData : indices.second)
-members.push_back((mlir::Value)memberIndicesData.memberMap);
+  for (mlir::omp::MapInfoOp memberMap : indices.second.memberMap)
+members.push_back(memberMap.getResult());
+
+  // create parent to emplace and bind members

skatrak wrote:

```suggestion
  // Create parent to emplace and bind members.
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
 baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+/*membersIndex=*/mlir::ArrayAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
 /*partial_map=*/builder.getBoolAttr(false));
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members
+  /// indices in the list to account for this new addition. This
+  /// requires inserting into the middle of a member index vector
+  /// in some cases (i.e. we could be accessing the member of a
+  /// descriptor type with a subsequent map, so we must be sure to
+  /// adjust any of these cases with the addition of the new base
+  /// address index value).
+  void adjustMemberIndices(
+  llvm::SmallVector> &memberIndices,
+  size_t memberIndex) {
+llvm::SmallVector baseAddrIndex = memberIndices[memberIndex];
+baseAddrIndex.push_back(0);
 
-if (auto mapClauseOwner =
-llvm::dyn_cast(target)) {
-  llvm::SmallVector newMapOps;
-  mlir::OperandRange mapVarsArr = mapClauseOwner.getMapVars();
+// If we find another member that is "derived/a member of" the descriptor
+// that is not the descriptor itself, we must insert a 0 for the new base
+// address we have just added for the descriptor into the list at the
+// appropriate position to maintain correctness of the positional/index 
data
+// for that member.
+size_t insertPosition =
+std::distance(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()));
+for (size_t i = 0; i < memberIndices.size(); ++i) {
+  if (memberIndices[i].size() > insertPosition &&
+  std::equal(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()),
+ memberIndices[i].begin())) {
+memberIndices[i].insert(
+std::next(memberIndices[i].begin(), insertPosition), 0);
+  }
+}
 
-  for (size_t i = 0; i < mapVarsArr.size(); ++i) {
-if (mapVarsArr[i] == op) {
-  // Push new implicit maps generated for the descriptor.
-  newMapOps.push_back(baseAddr);
+// Insert our newly created baseAddrIndex into the larger list of indices 
at
+// the correct location.
+memberIndices.insert(std::next(memberIndices.begin(), memberIndex + 1),
+ baseAddrIndex);
+  }
 
-  // for TargetOp's which have IsolatedFromAbove we must align the
-  // new additional map operand with an appropriate BlockArgument,
-  // as the printing and later processing currently requires a 1:1
-  // mapping of BlockArgs to MapInfoOp's at the same placement in
-  // each array (BlockArgs and MapVars).
-  if (auto targetOp = llvm::dyn_cast(target))
-targetOp.getRegion().insertArgument(i, baseAddr.getType(), loc);
-}
-newMapOps.push_back(mapVarsArr[i]);
+  /// Adjusts the descriptors map type the main alteration that is done
+  /// currently is transforming the map type to OMP_MAP_TO where possible.
+  // This is because we will always need to map the descriptor to device
+

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -267,6 +267,24 @@ mlir::Block *fir::FirOpBuilder::getAllocaBlock() {
   return getEntryBlock();
 }
 
+static mlir::ArrayAttr makeI64ArrayAttr(llvm::ArrayRef values,
+mlir::MLIRContext *context) {
+  llvm::SmallVector attrs;
+  for (auto &v : values)
+attrs.push_back(mlir::IntegerAttr::get(mlir::IntegerType::get(context, 64),
+   mlir::APInt(64, v)));
+  return mlir::ArrayAttr::get(context, attrs);
+}
+
+mlir::ArrayAttr fir::FirOpBuilder::create2DIntegerArrayAttr(
+llvm::SmallVectorImpl> &intData) {
+  llvm::SmallVector arrayAttr;

skatrak wrote:

```suggestion
  llvm::SmallVector arrayAttr;
  arrayAttr.reserve(intData.size());
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
 baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+/*membersIndex=*/mlir::ArrayAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
 /*partial_map=*/builder.getBoolAttr(false));
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members
+  /// indices in the list to account for this new addition. This
+  /// requires inserting into the middle of a member index vector
+  /// in some cases (i.e. we could be accessing the member of a
+  /// descriptor type with a subsequent map, so we must be sure to
+  /// adjust any of these cases with the addition of the new base
+  /// address index value).
+  void adjustMemberIndices(
+  llvm::SmallVector> &memberIndices,
+  size_t memberIndex) {
+llvm::SmallVector baseAddrIndex = memberIndices[memberIndex];
+baseAddrIndex.push_back(0);
 
-if (auto mapClauseOwner =
-llvm::dyn_cast(target)) {
-  llvm::SmallVector newMapOps;
-  mlir::OperandRange mapVarsArr = mapClauseOwner.getMapVars();
+// If we find another member that is "derived/a member of" the descriptor
+// that is not the descriptor itself, we must insert a 0 for the new base
+// address we have just added for the descriptor into the list at the
+// appropriate position to maintain correctness of the positional/index 
data
+// for that member.
+size_t insertPosition =
+std::distance(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()));
+for (size_t i = 0; i < memberIndices.size(); ++i) {
+  if (memberIndices[i].size() > insertPosition &&
+  std::equal(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()),
+ memberIndices[i].begin())) {
+memberIndices[i].insert(
+std::next(memberIndices[i].begin(), insertPosition), 0);
+  }
+}
 
-  for (size_t i = 0; i < mapVarsArr.size(); ++i) {
-if (mapVarsArr[i] == op) {
-  // Push new implicit maps generated for the descriptor.
-  newMapOps.push_back(baseAddr);
+// Insert our newly created baseAddrIndex into the larger list of indices 
at
+// the correct location.
+memberIndices.insert(std::next(memberIndices.begin(), memberIndex + 1),
+ baseAddrIndex);
+  }
 
-  // for TargetOp's which have IsolatedFromAbove we must align the
-  // new additional map operand with an appropriate BlockArgument,
-  // as the printing and later processing currently requires a 1:1
-  // mapping of BlockArgs to MapInfoOp's at the same placement in
-  // each array (BlockArgs and MapVars).
-  if (auto targetOp = llvm::dyn_cast(target))
-targetOp.getRegion().insertArgument(i, baseAddr.getType(), loc);
-}
-newMapOps.push_back(mapVarsArr[i]);
+  /// Adjusts the descriptors map type the main alteration that is done
+  /// currently is transforming the map type to OMP_MAP_TO where possible.
+  // This is because we will always need to map the descriptor to device
+

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
 baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+/*membersIndex=*/mlir::ArrayAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
 /*partial_map=*/builder.getBoolAttr(false));
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members
+  /// indices in the list to account for this new addition. This
+  /// requires inserting into the middle of a member index vector
+  /// in some cases (i.e. we could be accessing the member of a
+  /// descriptor type with a subsequent map, so we must be sure to
+  /// adjust any of these cases with the addition of the new base
+  /// address index value).
+  void adjustMemberIndices(
+  llvm::SmallVector> &memberIndices,
+  size_t memberIndex) {
+llvm::SmallVector baseAddrIndex = memberIndices[memberIndex];
+baseAddrIndex.push_back(0);
 
-if (auto mapClauseOwner =
-llvm::dyn_cast(target)) {
-  llvm::SmallVector newMapOps;
-  mlir::OperandRange mapVarsArr = mapClauseOwner.getMapVars();
+// If we find another member that is "derived/a member of" the descriptor
+// that is not the descriptor itself, we must insert a 0 for the new base
+// address we have just added for the descriptor into the list at the
+// appropriate position to maintain correctness of the positional/index 
data
+// for that member.
+size_t insertPosition =
+std::distance(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()));
+for (size_t i = 0; i < memberIndices.size(); ++i) {
+  if (memberIndices[i].size() > insertPosition &&
+  std::equal(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()),
+ memberIndices[i].begin())) {
+memberIndices[i].insert(
+std::next(memberIndices[i].begin(), insertPosition), 0);
+  }
+}
 
-  for (size_t i = 0; i < mapVarsArr.size(); ++i) {
-if (mapVarsArr[i] == op) {
-  // Push new implicit maps generated for the descriptor.
-  newMapOps.push_back(baseAddr);
+// Insert our newly created baseAddrIndex into the larger list of indices 
at
+// the correct location.
+memberIndices.insert(std::next(memberIndices.begin(), memberIndex + 1),
+ baseAddrIndex);
+  }
 
-  // for TargetOp's which have IsolatedFromAbove we must align the
-  // new additional map operand with an appropriate BlockArgument,
-  // as the printing and later processing currently requires a 1:1
-  // mapping of BlockArgs to MapInfoOp's at the same placement in
-  // each array (BlockArgs and MapVars).
-  if (auto targetOp = llvm::dyn_cast(target))
-targetOp.getRegion().insertArgument(i, baseAddr.getType(), loc);
-}
-newMapOps.push_back(mapVarsArr[i]);
+  /// Adjusts the descriptors map type the main alteration that is done
+  /// currently is transforming the map type to OMP_MAP_TO where possible.
+  // This is because we will always need to map the descriptor to device
+

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -237,26 +436,34 @@ class OMPMapInfoFinalizationPass
 
 getOperation()->walk([&](mlir::omp::MapInfoOp op) {
   // TODO: Currently only supports a single user for the MapInfoOp, this
-  // is fine for the moment as the Fortran Frontend will generate a
-  // new MapInfoOp per Target operation for the moment. However, when/if
-  // we optimise/cleanup the IR, it likely isn't too difficult to
-  // extend this function, it would require some modification to create a
-  // single new MapInfoOp per new MapInfoOp generated and share it across
-  // all users appropriately, making sure to only add a single member link
-  // per new generation for the original originating descriptor MapInfoOp.
+  // is fine for the moment as the Fortran frontend will generate a
+  // new MapInfoOp with at most one user currently, in the case of
+  // members of other objects like derived types, the user would be the
+  // parent, in cases where it's a regular non-member map the user would
+  // be the target operation it is being mapped by.

skatrak wrote:

```suggestion
  // is fine for the moment, as the Fortran frontend will generate a
  // new MapInfoOp with at most one user currently. In the case of
  // members of other objects, like derived types, the user would be the
  // parent. In cases where it's a regular non-member map, the user would
  // be the target operation it is being mapped by.
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
 baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+/*membersIndex=*/mlir::ArrayAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
 /*partial_map=*/builder.getBoolAttr(false));
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members
+  /// indices in the list to account for this new addition. This
+  /// requires inserting into the middle of a member index vector
+  /// in some cases (i.e. we could be accessing the member of a
+  /// descriptor type with a subsequent map, so we must be sure to
+  /// adjust any of these cases with the addition of the new base
+  /// address index value).
+  void adjustMemberIndices(
+  llvm::SmallVector> &memberIndices,
+  size_t memberIndex) {
+llvm::SmallVector baseAddrIndex = memberIndices[memberIndex];
+baseAddrIndex.push_back(0);
 
-if (auto mapClauseOwner =
-llvm::dyn_cast(target)) {
-  llvm::SmallVector newMapOps;
-  mlir::OperandRange mapVarsArr = mapClauseOwner.getMapVars();
+// If we find another member that is "derived/a member of" the descriptor
+// that is not the descriptor itself, we must insert a 0 for the new base
+// address we have just added for the descriptor into the list at the
+// appropriate position to maintain correctness of the positional/index 
data
+// for that member.
+size_t insertPosition =
+std::distance(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()));
+for (size_t i = 0; i < memberIndices.size(); ++i) {
+  if (memberIndices[i].size() > insertPosition &&
+  std::equal(baseAddrIndex.begin(), std::prev(baseAddrIndex.end()),
+ memberIndices[i].begin())) {
+memberIndices[i].insert(
+std::next(memberIndices[i].begin(), insertPosition), 0);
+  }
+}
 
-  for (size_t i = 0; i < mapVarsArr.size(); ++i) {
-if (mapVarsArr[i] == op) {
-  // Push new implicit maps generated for the descriptor.
-  newMapOps.push_back(baseAddr);
+// Insert our newly created baseAddrIndex into the larger list of indices 
at
+// the correct location.
+memberIndices.insert(std::next(memberIndices.begin(), memberIndex + 1),
+ baseAddrIndex);
+  }
 
-  // for TargetOp's which have IsolatedFromAbove we must align the
-  // new additional map operand with an appropriate BlockArgument,
-  // as the printing and later processing currently requires a 1:1
-  // mapping of BlockArgs to MapInfoOp's at the same placement in
-  // each array (BlockArgs and MapVars).
-  if (auto targetOp = llvm::dyn_cast(target))
-targetOp.getRegion().insertArgument(i, baseAddr.getType(), loc);
-}
-newMapOps.push_back(mapVarsArr[i]);
+  /// Adjusts the descriptors map type the main alteration that is done
+  /// currently is transforming the map type to OMP_MAP_TO where possible.
+  // This is because we will always need to map the descriptor to device
+

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
 baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+/*membersIndex=*/mlir::ArrayAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
 /*name=*/builder.getStringAttr(""),
 /*partial_map=*/builder.getBoolAttr(false));
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members

skatrak wrote:

```suggestion
  /// This function adjusts the member indices vector to include a new
  /// base address member. We take the position of the descriptor in
  /// the member indices list, which is the index data that the base
  /// addresses index will be based off of, as the base address is
  /// a member of the descriptor. We must also alter other member's
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +138,187 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.

skatrak wrote:

```suggestion
  /// Function that generates a FIR operation accessing the descriptor's
  /// base address (BoxOffsetOp) and a MapInfoOp for it. The most
  /// important thing to note is that we normally move the bounds from
  /// the descriptor map onto the base address map.
```

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -216,31 +215,50 @@ bool 
ClauseProcessor::processMotionClauses(lower::StatementContext &stmtCtx,
   if (origSymbol && fir::isTypeWithDescriptor(origSymbol.getType()))
 symAddr = origSymbol;
 
+  if (object.sym()->owner().IsDerivedType()) {
+omp::ObjectList objectList = gatherObjects(object, semaCtx);
+parentObj = objectList[0];
+parentMemberIndices.insert({parentObj.value(), {}});
+if (Fortran::semantics::IsAllocatableOrObjectPointer(
+object.sym()) ||
+memberHasAllocatableParent(object, semaCtx)) {
+  llvm::SmallVector indices =
+  generateMemberPlacementIndices(object, semaCtx);
+  symAddr = createParentSymAndGenIntermediateMaps(
+  clauseLocation, converter, objectList, indices,
+  parentMemberIndices[parentObj.value()], asFortran.str(),
+  mapTypeBits);
+}
+  }
+

skatrak wrote:

I think PR #101703 is already doing some of this refactoring, just so you're 
aware @agozillon.

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak commented:

Thanks Andrew for the updates to this patch. I gave it a fresh look and I've 
got another set of comments, but they should be easy to address.

https://github.com/llvm/llvm-project/pull/96266
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] [Flang][OpenMP] Derived type explicit allocatable member mapping (PR #96266)

2024-08-12 Thread Sergio Afonso via llvm-branch-commits



@@ -85,67 +135,227 @@ class OMPMapInfoFinalizationPass
   descriptor = alloca;
 }
 
+return descriptor;
+  }
+
+  /// Simple function that will generate a FIR operation accessing
+  /// the descriptors base address (BoxOffsetOp) and then generate a
+  /// MapInfoOp for it, the most important thing to note is that
+  /// we normally move the bounds from the descriptor map onto the
+  /// base address map.
+  mlir::omp::MapInfoOp getBaseAddrMap(mlir::Value descriptor,
+  mlir::OperandRange bounds,
+  int64_t mapType,
+  fir::FirOpBuilder &builder) {
+mlir::Location loc = descriptor.getLoc();
 mlir::Value baseAddrAddr = builder.create(
 loc, descriptor, fir::BoxFieldAttr::base_addr);
 
 // Member of the descriptor pointing at the allocated data
-mlir::Value baseAddr = builder.create(
+return builder.create(
 loc, baseAddrAddr.getType(), descriptor,
 mlir::TypeAttr::get(llvm::cast(
 fir::unwrapRefType(baseAddrAddr.getType()))
 .getElementType()),
-baseAddrAddr, /*members=*/mlir::SmallVector{},
-/*member_index=*/mlir::DenseIntElementsAttr{}, op.getBounds(),
-builder.getIntegerAttr(builder.getIntegerType(64, false),
-   op.getMapType().value()),
+baseAddrAddr, mlir::SmallVector{},
+mlir::DenseIntElementsAttr{}, bounds,
+builder.getIntegerAttr(builder.getIntegerType(64, false), mapType),
 builder.getAttr(
 mlir::omp::VariableCaptureKind::ByRef),
-/*name=*/builder.getStringAttr(""),
-/*partial_map=*/builder.getBoolAttr(false));
+builder.getStringAttr("") /*name*/,
+builder.getBoolAttr(false) /*partial_map*/);
+  }
 
-// TODO: map the addendum segment of the descriptor, similarly to the
-// above base address/data pointer member.
+  /// This function adjusts the member indices vector to include a new
+  /// base address member, we take the position of the descriptor in
+  /// the member indices list, which is the index data that the base
+  /// addresses index will be based off of, as the base address is
+  /// a member of the descriptor, we must also alter other members
+  /// indices in the list to account for this new addition. This
+  /// requires extending all members with -1's if the addition of
+  /// the new base address has increased the member vector past the
+  /// original size, as we must make sure all member indices are of
+  /// the same length (think rectangle matrix) due to DenseIntElementsAttr
+  /// requiring this. We also need to be aware that we are inserting
+  /// into the middle of a member index vector in some cases (i.e.
+  /// we could be accessing the member of a descriptor type with a
+  /// subsequent map, so we must be sure to adjust any of these cases
+  /// with the addition of the new base address index value).
+  void adjustMemberIndices(
+  llvm::SmallVector> &memberIndices,
+  size_t memberIndex) {
+// Find if the descriptor member we are basing our new base address index
+// off of has a -1 somewhere, indicating an empty index already exists (due
+// to a larger sized member position elsewhere) which allows us to simplify
+// later steps a little
+auto baseAddrIndex = memberIndices[memberIndex];
+auto *iterPos = std::find(baseAddrIndex.begin(), baseAddrIndex.end(), -1);
 
-if (auto mapClauseOwner =
-llvm::dyn_cast(target)) {
-  llvm::SmallVector newMapOps;
-  mlir::OperandRange mapOperandsArr = mapClauseOwner.getMapOperands();
+// If we aren't at the end, as we found a -1, we can simply modify the -1
+// to the base addresses index in the descriptor (which will always be the
+// first member in the descriptor, so 0). If not, then we're extending the
+// index list and have to push on a 0 and adjust the position to the new
+// end.
+if (iterPos != baseAddrIndex.end()) {
+  *iterPos = 0;
+} else {
+  baseAddrIndex.push_back(0);
+  iterPos = baseAddrIndex.end();
+}
 
-  for (size_t i = 0; i < mapOperandsArr.size(); ++i) {
-if (mapOperandsArr[i] == op) {
-  // Push new implicit maps generated for the descriptor.
-  newMapOps.push_back(baseAddr);
+auto isEqual = [](auto first1, auto last1, auto first2, auto last2) {
+  int v1, v2;
+  for (; first1 != last1; ++first1, ++first2) {
+v1 = (first1 == last1) ? -1 : *first1;
+v2 = (first2 == last2) ? -1 : *first2;
 
-  // for TargetOp's which have IsolatedFromAbove we must align the
-  // new additional map operand with an appropriate BlockArgument,
-  // as the printing and later processing currently requires a 1:1
-  // mapping of BlockArgs to MapInfoOp's at the same placement i

[llvm-branch-commits] [clang] release/19.x: [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) (PR #102924)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/102924

Backport d469794d0cdfd2fea50a6ce0c0e33abb242d744c

Requested by: @llvmbot

>From d6beb484536981c00754e94c99d666f895b56fcd Mon Sep 17 00:00:00 2001
From: Mariya Podchishchaeva 
Date: Mon, 12 Aug 2024 09:08:46 +0200
Subject: [PATCH] [clang] Avoid triggering vtable instantiation for C++23
 constexpr dtor (#102605)

In C++23 anything can be constexpr, including a dtor of a class whose
members and bases don't have constexpr dtors. Avoid early triggering of
vtable instantiation int this case.

Fixes https://github.com/llvm/llvm-project/issues/102293

(cherry picked from commit d469794d0cdfd2fea50a6ce0c0e33abb242d744c)
---
 clang/lib/Sema/SemaDeclCXX.cpp  | 29 -
 clang/test/SemaCXX/gh102293.cpp | 22 ++
 2 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/SemaCXX/gh102293.cpp

diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 04b8d88cae217b..38e1842040fb38 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -7042,11 +7042,38 @@ void Sema::CheckCompletedCXXClass(Scope *S, 
CXXRecordDecl *Record) {
   }
 }
 
+bool EffectivelyConstexprDestructor = true;
+// Avoid triggering vtable instantiation due to a dtor that is not
+// "effectively constexpr" for better compatibility.
+// See https://github.com/llvm/llvm-project/issues/102293 for more info.
+if (isa(M)) {
+  auto Check = [](QualType T, auto &&Check) -> bool {
+const CXXRecordDecl *RD =
+T->getBaseElementTypeUnsafe()->getAsCXXRecordDecl();
+if (!RD || !RD->isCompleteDefinition())
+  return true;
+
+if (!RD->hasConstexprDestructor())
+  return false;
+
+for (const CXXBaseSpecifier &B : RD->bases())
+  if (!Check(B.getType(), Check))
+return false;
+for (const FieldDecl *FD : RD->fields())
+  if (!Check(FD->getType(), Check))
+return false;
+return true;
+  };
+  EffectivelyConstexprDestructor =
+  Check(QualType(Record->getTypeForDecl(), 0), Check);
+}
+
 // Define defaulted constexpr virtual functions that override a base class
 // function right away.
 // FIXME: We can defer doing this until the vtable is marked as used.
 if (CSM != CXXSpecialMemberKind::Invalid && !M->isDeleted() &&
-M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods())
+M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods() &&
+EffectivelyConstexprDestructor)
   DefineDefaultedFunction(*this, M, M->getLocation());
 
 if (!Incomplete)
diff --git a/clang/test/SemaCXX/gh102293.cpp b/clang/test/SemaCXX/gh102293.cpp
new file mode 100644
index 00..30629fc03bf6a9
--- /dev/null
+++ b/clang/test/SemaCXX/gh102293.cpp
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -std=c++23 -fsyntax-only -verify %s
+// expected-no-diagnostics
+
+template  static void destroy() {
+T t;
+++t;
+}
+
+struct Incomplete;
+
+template  struct HasD {
+  ~HasD() { destroy(); }
+};
+
+struct HasVT {
+  virtual ~HasVT();
+};
+
+struct S : HasVT {
+  HasD<> v;
+};
+

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) (PR #102924)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102924
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) (PR #102924)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:

@cor3ntin What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/102924
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) (PR #102924)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot updated 
https://github.com/llvm/llvm-project/pull/102924

>From ee828c387981e06204e29effa41032062ddc6cf4 Mon Sep 17 00:00:00 2001
From: Mariya Podchishchaeva 
Date: Mon, 12 Aug 2024 09:08:46 +0200
Subject: [PATCH] [clang] Avoid triggering vtable instantiation for C++23
 constexpr dtor (#102605)

In C++23 anything can be constexpr, including a dtor of a class whose
members and bases don't have constexpr dtors. Avoid early triggering of
vtable instantiation int this case.

Fixes https://github.com/llvm/llvm-project/issues/102293

(cherry picked from commit d469794d0cdfd2fea50a6ce0c0e33abb242d744c)
---
 clang/lib/Sema/SemaDeclCXX.cpp  | 29 -
 clang/test/SemaCXX/gh102293.cpp | 22 ++
 2 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/SemaCXX/gh102293.cpp

diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 04b8d88cae217b..38e1842040fb38 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -7042,11 +7042,38 @@ void Sema::CheckCompletedCXXClass(Scope *S, 
CXXRecordDecl *Record) {
   }
 }
 
+bool EffectivelyConstexprDestructor = true;
+// Avoid triggering vtable instantiation due to a dtor that is not
+// "effectively constexpr" for better compatibility.
+// See https://github.com/llvm/llvm-project/issues/102293 for more info.
+if (isa(M)) {
+  auto Check = [](QualType T, auto &&Check) -> bool {
+const CXXRecordDecl *RD =
+T->getBaseElementTypeUnsafe()->getAsCXXRecordDecl();
+if (!RD || !RD->isCompleteDefinition())
+  return true;
+
+if (!RD->hasConstexprDestructor())
+  return false;
+
+for (const CXXBaseSpecifier &B : RD->bases())
+  if (!Check(B.getType(), Check))
+return false;
+for (const FieldDecl *FD : RD->fields())
+  if (!Check(FD->getType(), Check))
+return false;
+return true;
+  };
+  EffectivelyConstexprDestructor =
+  Check(QualType(Record->getTypeForDecl(), 0), Check);
+}
+
 // Define defaulted constexpr virtual functions that override a base class
 // function right away.
 // FIXME: We can defer doing this until the vtable is marked as used.
 if (CSM != CXXSpecialMemberKind::Invalid && !M->isDeleted() &&
-M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods())
+M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods() &&
+EffectivelyConstexprDestructor)
   DefineDefaultedFunction(*this, M, M->getLocation());
 
 if (!Incomplete)
diff --git a/clang/test/SemaCXX/gh102293.cpp b/clang/test/SemaCXX/gh102293.cpp
new file mode 100644
index 00..30629fc03bf6a9
--- /dev/null
+++ b/clang/test/SemaCXX/gh102293.cpp
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -std=c++23 -fsyntax-only -verify %s
+// expected-no-diagnostics
+
+template  static void destroy() {
+T t;
+++t;
+}
+
+struct Incomplete;
+
+template  struct HasD {
+  ~HasD() { destroy(); }
+};
+
+struct HasVT {
+  virtual ~HasVT();
+};
+
+struct S : HasVT {
+  HasD<> v;
+};
+

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang] Avoid triggering vtable instantiation for C++23 constexpr dtor (#102605) (PR #102924)

2024-08-12 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-clang

Author: None (llvmbot)


Changes

Backport d469794d0cdfd2fea50a6ce0c0e33abb242d744c

Requested by: @llvmbot

---
Full diff: https://github.com/llvm/llvm-project/pull/102924.diff


2 Files Affected:

- (modified) clang/lib/Sema/SemaDeclCXX.cpp (+28-1) 
- (added) clang/test/SemaCXX/gh102293.cpp (+22) 


``diff
diff --git a/clang/lib/Sema/SemaDeclCXX.cpp b/clang/lib/Sema/SemaDeclCXX.cpp
index 04b8d88cae217b..38e1842040fb38 100644
--- a/clang/lib/Sema/SemaDeclCXX.cpp
+++ b/clang/lib/Sema/SemaDeclCXX.cpp
@@ -7042,11 +7042,38 @@ void Sema::CheckCompletedCXXClass(Scope *S, 
CXXRecordDecl *Record) {
   }
 }
 
+bool EffectivelyConstexprDestructor = true;
+// Avoid triggering vtable instantiation due to a dtor that is not
+// "effectively constexpr" for better compatibility.
+// See https://github.com/llvm/llvm-project/issues/102293 for more info.
+if (isa(M)) {
+  auto Check = [](QualType T, auto &&Check) -> bool {
+const CXXRecordDecl *RD =
+T->getBaseElementTypeUnsafe()->getAsCXXRecordDecl();
+if (!RD || !RD->isCompleteDefinition())
+  return true;
+
+if (!RD->hasConstexprDestructor())
+  return false;
+
+for (const CXXBaseSpecifier &B : RD->bases())
+  if (!Check(B.getType(), Check))
+return false;
+for (const FieldDecl *FD : RD->fields())
+  if (!Check(FD->getType(), Check))
+return false;
+return true;
+  };
+  EffectivelyConstexprDestructor =
+  Check(QualType(Record->getTypeForDecl(), 0), Check);
+}
+
 // Define defaulted constexpr virtual functions that override a base class
 // function right away.
 // FIXME: We can defer doing this until the vtable is marked as used.
 if (CSM != CXXSpecialMemberKind::Invalid && !M->isDeleted() &&
-M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods())
+M->isDefaulted() && M->isConstexpr() && M->size_overridden_methods() &&
+EffectivelyConstexprDestructor)
   DefineDefaultedFunction(*this, M, M->getLocation());
 
 if (!Incomplete)
diff --git a/clang/test/SemaCXX/gh102293.cpp b/clang/test/SemaCXX/gh102293.cpp
new file mode 100644
index 00..30629fc03bf6a9
--- /dev/null
+++ b/clang/test/SemaCXX/gh102293.cpp
@@ -0,0 +1,22 @@
+// RUN: %clang_cc1 -std=c++23 -fsyntax-only -verify %s
+// expected-no-diagnostics
+
+template  static void destroy() {
+T t;
+++t;
+}
+
+struct Incomplete;
+
+template  struct HasD {
+  ~HasD() { destroy(); }
+};
+
+struct HasVT {
+  virtual ~HasVT();
+};
+
+struct S : HasVT {
+  HasD<> v;
+};
+

``




https://github.com/llvm/llvm-project/pull/102924
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [flang] 94a1b9e - Revert "[flang] Read the extra field from the in box when doing reboxing (#10…"

2024-08-12 Thread via llvm-branch-commits


Author: Valentin Clement (バレンタイン クレメン)
Date: 2024-08-12T09:35:08-07:00
New Revision: 94a1b9e30a4f7c96474a3e201dcc0f653046d493

URL: 
https://github.com/llvm/llvm-project/commit/94a1b9e30a4f7c96474a3e201dcc0f653046d493
DIFF: 
https://github.com/llvm/llvm-project/commit/94a1b9e30a4f7c96474a3e201dcc0f653046d493.diff

LOG: Revert "[flang] Read the extra field from the in box when doing reboxing 
(#10…"

This reverts commit dab7e3c30dd690e50858450b658f32a1d1e9cf86.

Added: 


Modified: 
flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
flang/lib/Optimizer/CodeGen/CodeGen.cpp
flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp
flang/test/Fir/convert-to-llvm.fir
flang/test/Fir/rebox.fir
flang/test/Fir/tbaa-codegen2.fir
flang/test/Fir/tbaa.fir

Removed: 




diff  --git a/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h 
b/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
index c820b83834de68..91b2025176770d 100644
--- a/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
+++ b/flang/include/flang/Optimizer/CodeGen/FIROpPatterns.h
@@ -107,10 +107,6 @@ class ConvertFIRToLLVMPattern : public 
mlir::ConvertToLLVMPattern {
  mlir::Value box,
  mlir::ConversionPatternRewriter &rewriter) const;
 
-  mlir::Value getExtraFromBox(mlir::Location loc, TypePair boxTy,
-  mlir::Value box,
-  mlir::ConversionPatternRewriter &rewriter) const;
-
   // Get the element type given an LLVM type that is of the form
   // (array|struct|vector)+ and the provided indexes.
   mlir::Type getBoxEleTy(mlir::Type type,

diff  --git a/flang/lib/Optimizer/CodeGen/CodeGen.cpp 
b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
index 07f6c83e9f111b..7934f1fdad2a0d 100644
--- a/flang/lib/Optimizer/CodeGen/CodeGen.cpp
+++ b/flang/lib/Optimizer/CodeGen/CodeGen.cpp
@@ -1227,8 +1227,7 @@ struct EmboxCommonConversion : public 
fir::FIROpConversion {
  mlir::ConversionPatternRewriter &rewriter,
  unsigned rank, mlir::Value eleSize,
  mlir::Value cfiTy, mlir::Value typeDesc,
- int allocatorIdx = kDefaultAllocator,
- mlir::Value extraField = {}) const {
+ int allocatorIdx = kDefaultAllocator) const {
 auto llvmBoxTy = this->lowerTy().convertBoxTypeAsStruct(boxTy, rank);
 bool isUnlimitedPolymorphic = fir::isUnlimitedPolymorphicType(boxTy);
 bool useInputType = fir::isPolymorphicType(boxTy) || 
isUnlimitedPolymorphic;
@@ -1247,23 +1246,16 @@ struct EmboxCommonConversion : public 
fir::FIROpConversion {
 
 const bool hasAddendum = fir::boxHasAddendum(boxTy);
 
-if (extraField) {
-  // Extra field value is provided so just use it.
-  descriptor =
-  insertField(rewriter, loc, descriptor, {kExtraPosInBox}, extraField);
-} else {
-  // Compute the value of the extra field based on allocator_idx and
-  // addendum present using a Descriptor object.
-  Fortran::runtime::StaticDescriptor<0> staticDescriptor;
-  Fortran::runtime::Descriptor &desc{staticDescriptor.descriptor()};
-  desc.raw().extra = 0;
-  desc.SetAllocIdx(allocatorIdx);
-  if (hasAddendum)
-desc.SetHasAddendum();
-  descriptor =
-  insertField(rewriter, loc, descriptor, {kExtraPosInBox},
-  this->genI32Constant(loc, rewriter, desc.raw().extra));
-}
+// Descriptor used to set the correct value of the extra field.
+Fortran::runtime::StaticDescriptor<0> staticDescriptor;
+Fortran::runtime::Descriptor &desc{staticDescriptor.descriptor()};
+desc.raw().extra = 0;
+desc.SetAllocIdx(allocatorIdx);
+if (hasAddendum)
+  desc.SetHasAddendum();
+descriptor =
+insertField(rewriter, loc, descriptor, {kExtraPosInBox},
+this->genI32Constant(loc, rewriter, desc.raw().extra));
 
 if (hasAddendum) {
   unsigned typeDescFieldId = getTypeDescFieldId(boxTy);
@@ -1388,14 +1380,10 @@ struct EmboxCommonConversion : public 
fir::FIROpConversion {
  rewriter);
 }
 
-mlir::Value extraField =
-this->getExtraFromBox(loc, inputBoxTyPair, loweredBox, rewriter);
-
 auto mod = box->template getParentOfType();
 mlir::Value descriptor =
 populateDescriptor(loc, mod, boxTy, box.getBox().getType(), rewriter,
-   rank, eleSize, cfiTy, typeDesc,
-   /*allocatorIdx=*/kDefaultAllocator, extraField);
+   rank, eleSize, cfiTy, typeDesc);
 
 return {boxTy, descriptor, eleSize};
   }

diff  --git a/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp 
b/flang/lib/Optimizer/CodeGen/FIROpPatterns.cpp
index 12021d

[llvm-branch-commits] [llvm] a79d1b9 - Revert "[MemProf] Reduce cloning overhead by sharing nodes when possible (#99…"

2024-08-12 Thread via llvm-branch-commits


Author: Teresa Johnson
Date: 2024-08-12T09:40:56-07:00
New Revision: a79d1b9501ad436f025e10b390e8858780163ab1

URL: 
https://github.com/llvm/llvm-project/commit/a79d1b9501ad436f025e10b390e8858780163ab1
DIFF: 
https://github.com/llvm/llvm-project/commit/a79d1b9501ad436f025e10b390e8858780163ab1.diff

LOG: Revert "[MemProf] Reduce cloning overhead by sharing nodes when possible 
(#99…"

This reverts commit 055e4319112282354327af9908091fdb25149e9b.

Added: 


Modified: 
llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp

Removed: 




diff  --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp 
b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index c9de9c964bba0a..66b68d5cd457fb 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -242,16 +242,9 @@ class CallsiteContextGraph {
 // recursion.
 bool Recursive = false;
 
-// The corresponding allocation or interior call. This is the primary call
-// for which we have created this node.
+// The corresponding allocation or interior call.
 CallInfo Call;
 
-// List of other calls that can be treated the same as the primary call
-// through cloning. I.e. located in the same function and have the same
-// (possibly pruned) stack ids. They will be updated the same way as the
-// primary call when assigning to function clones.
-std::vector MatchingCalls;
-
 // For alloc nodes this is a unique id assigned when constructed, and for
 // callsite stack nodes it is the original stack id when the node is
 // constructed from the memprof MIB metadata on the alloc nodes. Note that
@@ -464,9 +457,6 @@ class CallsiteContextGraph {
   /// iteration.
   MapVector> FuncToCallsWithMetadata;
 
-  /// Records the function each call is located in.
-  DenseMap CallToFunc;
-
   /// Map from callsite node to the enclosing caller function.
   std::map NodeToCallingFunc;
 
@@ -484,8 +474,7 @@ class CallsiteContextGraph {
   /// StackIdToMatchingCalls map.
   void assignStackNodesPostOrder(
   ContextNode *Node, DenseSet &Visited,
-  DenseMap> &StackIdToMatchingCalls,
-  DenseMap &CallToMatchingCall);
+  DenseMap> 
&StackIdToMatchingCalls);
 
   /// Duplicates the given set of context ids, updating the provided
   /// map from each original id with the newly generated context ids,
@@ -1241,11 +1230,10 @@ static void checkNode(const ContextNode *Node,
 
 template 
 void CallsiteContextGraph::
-assignStackNodesPostOrder(
-ContextNode *Node, DenseSet &Visited,
-DenseMap>
-&StackIdToMatchingCalls,
-DenseMap &CallToMatchingCall) {
+assignStackNodesPostOrder(ContextNode *Node,
+  DenseSet &Visited,
+  DenseMap>
+  &StackIdToMatchingCalls) {
   auto Inserted = Visited.insert(Node);
   if (!Inserted.second)
 return;
@@ -1258,8 +1246,7 @@ void CallsiteContextGraph::
 // Skip any that have been removed during the recursion.
 if (!Edge)
   continue;
-assignStackNodesPostOrder(Edge->Caller, Visited, StackIdToMatchingCalls,
-  CallToMatchingCall);
+assignStackNodesPostOrder(Edge->Caller, Visited, StackIdToMatchingCalls);
   }
 
   // If this node's stack id is in the map, update the graph to contain new
@@ -1302,19 +1289,8 @@ void CallsiteContextGraph::
 auto &[Call, Ids, Func, SavedContextIds] = Calls[I];
 // Skip any for which we didn't assign any ids, these don't get a node in
 // the graph.
-if (SavedContextIds.empty()) {
-  // If this call has a matching call (located in the same function and
-  // having the same stack ids), simply add it to the context node created
-  // for its matching call earlier. These can be treated the same through
-  // cloning and get updated at the same time.
-  if (!CallToMatchingCall.contains(Call))
-continue;
-  auto MatchingCall = CallToMatchingCall[Call];
-  assert(NonAllocationCallToContextNodeMap.contains(MatchingCall));
-  NonAllocationCallToContextNodeMap[MatchingCall]->MatchingCalls.push_back(
-  Call);
+if (SavedContextIds.empty())
   continue;
-}
 
 assert(LastId == Ids.back());
 
@@ -1446,10 +1422,6 @@ void CallsiteContextGraph::updateStackNodes() {
   // there is more than one call with the same stack ids. Their (possibly newly
   // duplicated) context ids are saved in the StackIdToMatchingCalls map.
   DenseMap> OldToNewContextIds;
-  // Save a map from each call to any that are found to match it. I.e. located
-  // in the same function and have the same (possibly pruned) stack ids. We use
-  // this to avoid creating extra graph nodes as they can be treated the same.
-  DenseMap CallToMatchingCall;
   for (auto &It : StackIdTo

[llvm-branch-commits] [DXIL][Analysis] Boilerplate for DXILResourceAnalysis pass (PR #100700)

2024-08-12 Thread Cooper Partin via llvm-branch-commits


https://github.com/coopp approved this pull request.

Looks good.

https://github.com/llvm/llvm-project/pull/100700
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] release/19.x: [clang] Implement -fptrauth-auth-traps. (#102417) (PR #102938)

2024-08-12 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/102938
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

1 2 >

1 - 100 of 140 matches

Mail list logo