date:20240905

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Fabian Ritter via llvm-branch-commits



@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s -relocation-model=pic | 
FileCheck %s

ritter-x2a wrote:

I think -verify-machineinstrs is useful here: Without my patch, the test fails 
in the MachineVerifier, where the call stack pseudos are checked. Without 
-verify-machineinstrs, this would only happen in builds with expensive checks 
enabled, and the test would be ineffective for other builds.

https://github.com/llvm/llvm-project/pull/106965
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/106965

>From c332034894c9fa3de26daedb28a977a3580dc4d8 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Mon, 2 Sep 2024 05:37:33 -0400
Subject: [PATCH] [X86] Avoid generating nested CALLSEQ for TLS pointer
 function arguments

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.
---
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  7 +
 .../test/CodeGen/X86/tls-function-argument.ll | 30 +++
 2 files changed, 37 insertions(+)
 create mode 100644 llvm/test/CodeGen/X86/tls-function-argument.ll

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index bbee0af109c74b..bf9777888df831 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -35593,6 +35593,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
   // inside MC, therefore without the two markers shrink-wrapping
   // may push the prologue/epilogue pass them.
   const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
+
+  // Do not introduce CALLSEQ markers if we are already in a call sequence.
+  // Nested call sequences are not allowed and cause errors in the machine
+  // verifier.
+  if (TII.getCallFrameSizeAt(MI).has_value())
+return BB;
+
   const MIMetadata MIMD(MI);
   MachineFunction &MF = *BB->getParent();
 
diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll 
b/llvm/test/CodeGen/X86/tls-function-argument.ll
new file mode 100644
index 00..9b6ab529db3ea3
--- /dev/null
+++ b/llvm/test/CodeGen/X86/tls-function-argument.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs -relocation-model=pic < %s | 
FileCheck %s
+
+; Passing a pointer to thread-local storage to a function can be problematic
+; since computing such addresses requires a function call that is introduced
+; very late in instruction selection. We need to ensure that we don't introduce
+; nested call sequence markers if this function call happens in a call 
sequence.
+
+@TLS = internal thread_local global i64 zeroinitializer, align 8
+declare void @bar(ptr)
+define internal void @foo() {
+; CHECK-LABEL: foo:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pushq %rbx
+; CHECK-NEXT:.cfi_def_cfa_offset 16
+; CHECK-NEXT:.cfi_offset %rbx, -16
+; CHECK-NEXT:leaq TLS@TLSLD(%rip), %rdi
+; CHECK-NEXT:callq __tls_get_addr@PLT
+; CHECK-NEXT:leaq TLS@DTPOFF(%rax), %rbx
+; CHECK-NEXT:movq %rbx, %rdi
+; CHECK-NEXT:callq bar@PLT
+; CHECK-NEXT:movq %rbx, %rdi
+; CHECK-NEXT:callq bar@PLT
+; CHECK-NEXT:popq %rbx
+; CHECK-NEXT:.cfi_def_cfa_offset 8
+; CHECK-NEXT:retq
+  call void @bar(ptr @TLS)
+  call void @bar(ptr @TLS)
+  ret void
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Fabian Ritter via llvm-branch-commits


ritter-x2a wrote:

> This sounds sketchy to me. Is it really valid to enter a second call inside 
> another call's CALLSEQ markers, but only if we avoid adding a second nested 
> set of markers? It feels like attacking the symptom of the issue, but not the 
> root cause. (I'm not certain it's _not_ valid, but it just seems really 
> suspicious...)

>From what I've gathered from the source comments and the 
>[patch](https://github.com/llvm/llvm-project/commit/228978c0dcfc9a9793f3dc8a69f42471192223bc)
> introducing the code that inserts these CALLSEQ markers for TLSADDRs, their 
>only point here is to stop shrink-wrapping from moving the function 
>prologue/epilogue past the call to get the TLS address. This should also be 
>given when the TLSADDR is in another CALLSEQ.

I am however by no means an expert on this topic; I'd appreciate more insights 
on which uses of CALLSEQ markers are and are not valid (besides the 
MachineVerifier checks).

https://github.com/llvm/llvm-project/pull/106965
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)

2024-09-05 Thread Rainer Orth via llvm-branch-commits


https://github.com/rorth created 
https://github.com/llvm/llvm-project/pull/107362

As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit SPARC, both 
Linux/sparc64 and Solaris/sparcv9 (where the tests work when enabled):
```
  Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c
  Profile-sparc :: ContinuousSyncMode/set-file-object.c
```
The Solaris linker provides the crucial clue as to what's wrong:
```
ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes:
(file runtime-counter-relocation-17ff25.o value=0x8; file 
libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4);
runtime-counter-relocation-17ff25.o definition taken
```
In fact, the types in `llvm` and `compiler-rt` differ:
- `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is 
created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` 
(`InstrLowerer::getCounterAddress`) as `int64_t`, while 
`compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this 
doesn't matter in the 64-bit case, the type sizes differ for 32-bit.

This patch changes the `compiler-rt` type to match `llvm`. At the same time, 
the affected testcases are enabled on Solaris, too, where they now just `PASS`.

Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, 
`x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11.

This is a backport of PR #102747, adjusted for the lack of 
`__llvm_profile_bitmap_bias` on the `release/19.x` branch.

>From 086147ec9d3428b6abe137f1d7ac7aa17aa8a715 Mon Sep 17 00:00:00 2001
From: Rainer Orth 
Date: Thu, 5 Sep 2024 09:51:08 +0200
Subject: [PATCH] [profile] Change __llvm_profile_counter_bias type to match
 llvm

As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit
SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when
enabled):
```
  Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c
  Profile-sparc :: ContinuousSyncMode/set-file-object.c
```
The Solaris linker provides the crucial clue as to what's wrong:
```
ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes:
(file runtime-counter-relocation-17ff25.o value=0x8; file 
libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4);
runtime-counter-relocation-17ff25.o definition taken
```
In fact, the types in `llvm` and `compiler-rt` differ:
- `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is
  created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp`
  (`InstrLowerer::getCounterAddress`) as `int64_t`, while
  `compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While
  this doesn't matter in the 64-bit case, the type sizes differ for
  32-bit.

This patch changes the `compiler-rt` type to match `llvm`. At the same
time, the affected testcases are enabled on Solaris, too, where they now
just `PASS`.

Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`,
`x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11.

This is a backport of PR #102747, adjusted for the lack of
`__llvm_profile_bitmap_bias` on the `release/19.x` branch.
---
 compiler-rt/lib/profile/InstrProfilingFile.c| 6 +++---
 compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c | 2 +-
 .../profile/ContinuousSyncMode/runtime-counter-relocation.c | 2 +-
 .../test/profile/ContinuousSyncMode/set-file-object.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c 
b/compiler-rt/lib/profile/InstrProfilingFile.c
index 1c58584d2d4f73..3bb2ae068305c9 100644
--- a/compiler-rt/lib/profile/InstrProfilingFile.c
+++ b/compiler-rt/lib/profile/InstrProfilingFile.c
@@ -198,12 +198,12 @@ static int mmapForContinuousMode(uint64_t 
CurrentFileOffset, FILE *File) {
 
 #define INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR
\
   INSTR_PROF_CONCAT(INSTR_PROF_PROFILE_COUNTER_BIAS_VAR, _default)
-COMPILER_RT_VISIBILITY intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 
0;
+COMPILER_RT_VISIBILITY int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0;
 
 /* This variable is a weak external reference which could be used to detect
  * whether or not the compiler defined this symbol. */
 #if defined(_MSC_VER)
-COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
+COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
 #if defined(_M_IX86) || defined(__i386__)
 #define WIN_SYM_PREFIX "_"
 #else
@@ -214,7 +214,7 @@ COMPILER_RT_VISIBILITY extern intptr_t 
INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
 INSTR_PROF_PROFILE_COUNTER_BIAS_VAR) "=" WIN_SYM_PREFIX
\
 INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR))
 #else
-COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR
+COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR
 __attribute__((weak, alias(INSTR_PROF_QUOTE(

[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)

2024-09-05 Thread Rainer Orth via llvm-branch-commits


https://github.com/rorth milestoned 
https://github.com/llvm/llvm-project/pull/107362
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)

2024-09-05 Thread Rainer Orth via llvm-branch-commits


https://github.com/rorth edited https://github.com/llvm/llvm-project/pull/107362
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)

2024-09-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-pgo

Author: Rainer Orth (rorth)


Changes

As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit 
SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when 
enabled):
```
  Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c
  Profile-sparc :: ContinuousSyncMode/set-file-object.c
```
The Solaris linker provides the crucial clue as to what's wrong:
```
ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes:
(file runtime-counter-relocation-17ff25.o value=0x8; file 
libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4);
runtime-counter-relocation-17ff25.o definition taken
```
In fact, the types in `llvm` and `compiler-rt` differ:
- `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is 
created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` 
(`InstrLowerer::getCounterAddress`) as `int64_t`, while 
`compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this 
doesn't matter in the 64-bit case, the type sizes differ for 32-bit.

This patch changes the `compiler-rt` type to match `llvm`. At the same time, 
the affected testcases are enabled on Solaris, too, where they now just `PASS`.

Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, 
`x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11.

This is a backport of PR #102747, adjusted for the lack of 
`__llvm_profile_bitmap_bias` on the `release/19.x` branch.

---
Full diff: https://github.com/llvm/llvm-project/pull/107362.diff


4 Files Affected:

- (modified) compiler-rt/lib/profile/InstrProfilingFile.c (+3-3) 
- (modified) compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c (+1-1) 
- (modified) 
compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c (+1-1) 
- (modified) compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c 
(+1-1) 


``diff
diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c 
b/compiler-rt/lib/profile/InstrProfilingFile.c
index 1c58584d2d4f73..3bb2ae068305c9 100644
--- a/compiler-rt/lib/profile/InstrProfilingFile.c
+++ b/compiler-rt/lib/profile/InstrProfilingFile.c
@@ -198,12 +198,12 @@ static int mmapForContinuousMode(uint64_t 
CurrentFileOffset, FILE *File) {
 
 #define INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR
\
   INSTR_PROF_CONCAT(INSTR_PROF_PROFILE_COUNTER_BIAS_VAR, _default)
-COMPILER_RT_VISIBILITY intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 
0;
+COMPILER_RT_VISIBILITY int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0;
 
 /* This variable is a weak external reference which could be used to detect
  * whether or not the compiler defined this symbol. */
 #if defined(_MSC_VER)
-COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
+COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
 #if defined(_M_IX86) || defined(__i386__)
 #define WIN_SYM_PREFIX "_"
 #else
@@ -214,7 +214,7 @@ COMPILER_RT_VISIBILITY extern intptr_t 
INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
 INSTR_PROF_PROFILE_COUNTER_BIAS_VAR) "=" WIN_SYM_PREFIX
\
 INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR))
 #else
-COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR
+COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR
 __attribute__((weak, alias(INSTR_PROF_QUOTE(
  INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR;
 #endif
diff --git a/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c 
b/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c
index fdcb82e4d72baf..65b7bdaf403da4 100644
--- a/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c
+++ b/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c
@@ -35,7 +35,7 @@
 #include "InstrProfilingUtil.h"
 
 /* This variable is an external reference to symbol defined by the compiler. */
-COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
+COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR;
 
 COMPILER_RT_VISIBILITY unsigned lprofProfileDumped(void) {
   return 1;
diff --git 
a/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c 
b/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c
index 4ca8bf62455371..19a7aae70cb0d3 100644
--- a/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c
+++ b/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c
@@ -1,4 +1,4 @@
-// REQUIRES: linux || windows
+// REQUIRES: target={{.*(linux|solaris|windows-msvc).*}}
 
 // RUN: %clang -fprofile-instr-generate -fcoverage-mapping -mllvm 
-runtime-counter-relocation=true -o %t.exe %s
 // RUN: echo "garbage" > %t.profraw
diff --git a/compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c 
b/compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c
index b52324d7091eb2..53609f5838f753

[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document clause-based op representation (PR #107234)

2024-09-05 Thread Tom Eccles via llvm-branch-commits


https://github.com/tblah approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/107234
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document loop representation (PR #107235)

2024-09-05 Thread Tom Eccles via llvm-branch-commits


https://github.com/tblah approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/107235
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)

2024-09-05 Thread Tom Eccles via llvm-branch-commits


https://github.com/tblah approved this pull request.

LGTM, thank you for writing all of these

https://github.com/llvm/llvm-project/pull/107236
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)

2024-09-05 Thread Sergio Afonso via llvm-branch-commits


https://github.com/skatrak updated 
https://github.com/llvm/llvm-project/pull/107236

>From da68c8b8be9588251bb4342e869a52035fc45a8e Mon Sep 17 00:00:00 2001
From: Sergio Afonso 
Date: Wed, 4 Sep 2024 13:21:22 +0100
Subject: [PATCH 1/2] [MLIR][OpenMP] NFC: Document compound constructs
 representation

This patch documents the MLIR representation of OpenMP compound constructs
discussed in
[this](https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986)
and
[this](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972)
RFC.
---
 mlir/docs/Dialects/OpenMPDialect/_index.md | 114 +
 1 file changed, 114 insertions(+)

diff --git a/mlir/docs/Dialects/OpenMPDialect/_index.md 
b/mlir/docs/Dialects/OpenMPDialect/_index.md
index 65b9c5d79f73e9..28ebb1fe3cf3f8 100644
--- a/mlir/docs/Dialects/OpenMPDialect/_index.md
+++ b/mlir/docs/Dialects/OpenMPDialect/_index.md
@@ -287,3 +287,117 @@ been implemented, but it is closely linked to the 
`omp.canonical_loop` work.
 Nevertheless, loop transformation that the `collapse` clause for 
loop-associated
 worksharing constructs defines can be represented by introducing multiple
 bounds, step and induction variables to the `omp.loop_nest` operation.
+
+## Compound Construct Representation
+
+The OpenMP specification defines certain shortcuts that allow specifying
+multiple constructs in a single directive, which are referred to as compound
+constructs (e.g. `parallel do` contains the `parallel` and `do` constructs).
+These can be further classified into [combined](#combined-constructs) and
+[composite](#composite-constructs) constructs. This section describes how they
+are represented in the dialect.
+
+When clauses are specified for compound constructs, the OpenMP specification
+defines a set of rules to decide to which leaf constructs they apply, as well 
as
+potentially introducing some other implicit clauses. These rules must be taken
+into account by those creating the MLIR representation, since it is a per-leaf
+representation that expects these rules to have already been followed.
+
+### Combined Constructs
+
+Combined constructs are semantically equivalent to specifying one construct
+immediately nested inside another. This property is used to simplify the 
dialect
+by representing them through the operations associated to each leaf construct.
+For example, `target teams` would be represented as follows:
+
+```mlir
+omp.target ... {
+  ...
+  omp.teams ... {
+...
+omp.terminator
+  }
+  ...
+  omp.terminator
+}
+```
+
+### Composite Constructs
+
+Composite constructs are similar to combined constructs in that they specify 
the
+effect of one construct being applied immediately after another. However, they
+group together constructs that cannot be directly nested into each other.
+Specifically, they group together multiple loop-associated constructs that 
apply
+to the same collapsed loop nest.
+
+As of version 5.2 of the OpenMP specification, the list of composite constructs
+is the following:
+  - `{do,for} simd`;
+  - `distribute simd`;
+  - `distribute parallel {do,for}`;
+  - `distribute parallel {do,for} simd`; and
+  - `taskloop simd`.
+
+Even though the list of composite constructs is relatively short and it would
+also be possible to create dialect operations for each, it was decided to
+allow attaching multiple loop wrappers to a single loop instead. This minimizes
+redundancy in the dialect and maximizes its modularity, since there is a single
+operation for each leaf construct regardless of whether it can be part of a
+composite construct. On the other hand, this means the `omp.loop_nest` 
operation
+will have to be interpreted differently depending on how many and which loop
+wrappers are attached to it.
+
+To simplify the detection of operations taking part in the representation of a
+composite construct, the `ComposableOpInterface` was introduced. Its purpose is
+to handle the `omp.composite` discardable dialect attribute that can optionally
+be attached to these operations. Operation verifiers will ensure its presence 
is
+consistent with the context the operation appears in, so that it is valid when
+the attribute is present if and only if it represents a leaf of a composite
+construct.
+
+For example, the `distribute simd` composite construct is represented as
+follows:
+
+```mlir
+omp.distribute ... {
+  omp.simd ... {
+omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) {
+  ...
+  omp.yield
+}
+omp.terminator
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+```
+
+One exception to this is the representation of the
+`distribute parallel {do,for}` composite construct. The presence of a
+block-associated `parallel` leaf construct would introduce many problems if it
+was allowed to work as a loop wrapper. In this case, the "hoisted 
`omp.parallel`
+representation" is used instead. This consists in mak

[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)

2024-09-05 Thread Sergio Afonso via llvm-branch-commits



@@ -287,3 +287,117 @@ been implemented, but it is closely linked to the 
`omp.canonical_loop` work.
 Nevertheless, loop transformation that the `collapse` clause for 
loop-associated
 worksharing constructs defines can be represented by introducing multiple
 bounds, step and induction variables to the `omp.loop_nest` operation.
+
+## Compound Construct Representation
+
+The OpenMP specification defines certain shortcuts that allow specifying
+multiple constructs in a single directive, which are referred to as compound
+constructs (e.g. `parallel do` contains the `parallel` and `do` constructs).
+These can be further classified into [combined](#combined-constructs) and
+[composite](#composite-constructs) constructs. This section describes how they
+are represented in the dialect.
+
+When clauses are specified for compound constructs, the OpenMP specification
+defines a set of rules to decide to which leaf constructs they apply, as well 
as
+potentially introducing some other implicit clauses. These rules must be taken
+into account by those creating the MLIR representation, since it is a per-leaf
+representation that expects these rules to have already been followed.
+
+### Combined Constructs
+
+Combined constructs are semantically equivalent to specifying one construct
+immediately nested inside another. This property is used to simplify the 
dialect
+by representing them through the operations associated to each leaf construct.
+For example, `target teams` would be represented as follows:
+
+```mlir
+omp.target ... {
+  ...
+  omp.teams ... {
+...
+omp.terminator
+  }
+  ...
+  omp.terminator
+}
+```
+
+### Composite Constructs
+
+Composite constructs are similar to combined constructs in that they specify 
the
+effect of one construct being applied immediately after another. However, they
+group together constructs that cannot be directly nested into each other.
+Specifically, they group together multiple loop-associated constructs that 
apply
+to the same collapsed loop nest.
+
+As of version 5.2 of the OpenMP specification, the list of composite constructs
+is the following:
+  - `{do,for} simd`;
+  - `distribute simd`;
+  - `distribute parallel {do,for}`;
+  - `distribute parallel {do,for} simd`; and
+  - `taskloop simd`.
+
+Even though the list of composite constructs is relatively short and it would
+also be possible to create dialect operations for each, it was decided to
+allow attaching multiple loop wrappers to a single loop instead. This minimizes
+redundancy in the dialect and maximizes its modularity, since there is a single
+operation for each leaf construct regardless of whether it can be part of a
+composite construct. On the other hand, this means the `omp.loop_nest` 
operation
+will have to be interpreted differently depending on how many and which loop
+wrappers are attached to it.
+
+To simplify the detection of operations taking part in the representation of a
+composite construct, the `ComposableOpInterface` was introduced. Its purpose is
+to handle the `omp.composite` discardable dialect attribute that can optionally
+be attached to these operations. Operation verifiers will ensure its presence 
is
+consistent with the context the operation appears in, so that it is valid when
+the attribute is present if and only if it represents a leaf of a composite
+construct.
+
+For example, the `distribute simd` composite construct is represented as
+follows:
+
+```mlir
+omp.distribute ... {
+  omp.simd ... {
+omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) {
+  ...
+  omp.yield
+}
+omp.terminator
+  } {omp.composite}
+  omp.terminator
+} {omp.composite}
+```
+
+One exception to this is the representation of the
+`distribute parallel {do,for}` composite construct. The presence of a
+block-associated `parallel` leaf construct would introduce many problems if it
+was allowed to work as a loop wrapper. In this case, the "hoisted 
`omp.parallel`
+representation" is used instead. This consists in making `omp.parallel` the
+parent operation, with a nested `omp.loop_nest` wrapped by `omp.distribute` and
+`omp.wsloop` (and `omp.simd`, in the `distribute parallel {do,for} simd` case).
+
+This approach works because `parallel` is a parallelism-generating construct,
+whereas `distribute` is a worksharing construct impacting the higher level
+`teams`, making the ordering between these constructs not cause semantic

skatrak wrote:

Thank you for noticing, done.

https://github.com/llvm/llvm-project/pull/107236
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Fabian Ritter via llvm-branch-commits


https://github.com/ritter-x2a updated 
https://github.com/llvm/llvm-project/pull/106965

>From a647e4446cbcc1018c1298ee411c80a855dc4ad9 Mon Sep 17 00:00:00 2001
From: Fabian Ritter 
Date: Mon, 2 Sep 2024 05:37:33 -0400
Subject: [PATCH] [X86] Avoid generating nested CALLSEQ for TLS pointer
 function arguments

When a pointer to thread-local storage is passed in a function call,
ISel first lowers the call and wraps the resulting code in CALLSEQ
markers. Afterwards, to compute the pointer to TLS, a call to retrieve
the TLS base address is generated and then wrapped in a set of CALLSEQ
markers. If the latter call is inserted into the call sequence of the
former call, this leads to nested call frames, which are illegal and
lead to errors in the machine verifier.

This patch avoids surrounding the call to compute the TLS base address
in CALLSEQ markers if it is already surrounded by such markers. It
relies on zero-sized call frames being represented in the call frame
size info stored in the MachineBBs.

Fixes #45574 and #98042.
---
 llvm/lib/Target/X86/X86ISelLowering.cpp   |  7 +
 .../test/CodeGen/X86/tls-function-argument.ll | 30 +++
 2 files changed, 37 insertions(+)
 create mode 100644 llvm/test/CodeGen/X86/tls-function-argument.ll

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp 
b/llvm/lib/Target/X86/X86ISelLowering.cpp
index e5ba54c176f07b..6bd9efa4950828 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -35603,6 +35603,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI,
   // inside MC, therefore without the two markers shrink-wrapping
   // may push the prologue/epilogue pass them.
   const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
+
+  // Do not introduce CALLSEQ markers if we are already in a call sequence.
+  // Nested call sequences are not allowed and cause errors in the machine
+  // verifier.
+  if (TII.getCallFrameSizeAt(MI).has_value())
+return BB;
+
   const MIMetadata MIMD(MI);
   MachineFunction &MF = *BB->getParent();
 
diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll 
b/llvm/test/CodeGen/X86/tls-function-argument.ll
new file mode 100644
index 00..9b6ab529db3ea3
--- /dev/null
+++ b/llvm/test/CodeGen/X86/tls-function-argument.ll
@@ -0,0 +1,30 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64 -verify-machineinstrs -relocation-model=pic < %s | 
FileCheck %s
+
+; Passing a pointer to thread-local storage to a function can be problematic
+; since computing such addresses requires a function call that is introduced
+; very late in instruction selection. We need to ensure that we don't introduce
+; nested call sequence markers if this function call happens in a call 
sequence.
+
+@TLS = internal thread_local global i64 zeroinitializer, align 8
+declare void @bar(ptr)
+define internal void @foo() {
+; CHECK-LABEL: foo:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:pushq %rbx
+; CHECK-NEXT:.cfi_def_cfa_offset 16
+; CHECK-NEXT:.cfi_offset %rbx, -16
+; CHECK-NEXT:leaq TLS@TLSLD(%rip), %rdi
+; CHECK-NEXT:callq __tls_get_addr@PLT
+; CHECK-NEXT:leaq TLS@DTPOFF(%rax), %rbx
+; CHECK-NEXT:movq %rbx, %rdi
+; CHECK-NEXT:callq bar@PLT
+; CHECK-NEXT:movq %rbx, %rdi
+; CHECK-NEXT:callq bar@PLT
+; CHECK-NEXT:popq %rbx
+; CHECK-NEXT:.cfi_def_cfa_offset 8
+; CHECK-NEXT:retq
+  call void @bar(ptr @TLS)
+  call void @bar(ptr @TLS)
+  ret void
+}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [clang] [Clang] Workaround dependent source location issues (#106925) (PR #107212)

2024-09-05 Thread Aaron Ballman via llvm-branch-commits


https://github.com/AaronBallman approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/107212
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)

2024-09-05 Thread Jay Foad via llvm-branch-commits


jayfoad wrote:

> > This sounds sketchy to me. Is it really valid to enter a second call inside 
> > another call's CALLSEQ markers, but only if we avoid adding a second nested 
> > set of markers? It feels like attacking the symptom of the issue, but not 
> > the root cause. (I'm not certain it's _not_ valid, but it just seems really 
> > suspicious...)
> 
> From what I've gathered from the source comments and the 
> [patch](https://github.com/llvm/llvm-project/commit/228978c0dcfc9a9793f3dc8a69f42471192223bc)
>  introducing the code that inserts these CALLSEQ markers for TLSADDRs, their 
> only point here is to stop shrink-wrapping from moving the function 
> prologue/epilogue past the call to get the TLS address. This should also be 
> given when the TLSADDR is in another CALLSEQ.
> 
> I am however by no means an expert on this topic; I'd appreciate more 
> insights on which uses of CALLSEQ markers are and are not valid (besides the 
> MachineVerifier checks).

I also wondered about this. Are there other mechanisms that block shrink 
wrapping from moving the prologue? E.g. what if a regular instruction (not a 
call) has to come after the prologue, how would that be marked? Maybe adding an 
implicit use or def of some particular physical register would be enough??

https://github.com/llvm/llvm-project/pull/106965
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Add pseudo probe inline tree to YAML profile (PR #107137)

2024-09-05 Thread via llvm-branch-commits


WenleiHe wrote:

Didn't realize yaml profile currently doesn't have probe inline tree encoded. 
This can increase profile size a bit, just making sure that's not a concern for 
yaml profile. 

https://github.com/llvm/llvm-project/pull/107137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: Sander de Smalen (sdesmalen-arm)


Changes

The functionality to make use of SVE's load/store pair instructions for the 
callee-saves is broken because the offsets used in the instructions are 
incorrect.

This is addressed by #105518 but given the complexity of this code and 
the subtleties around calculating the right offsets, we favour disabling the 
behaviour altogether for LLVM 19.

This fix is critical for any programs being compiled with `+sme2`.

---

Patch is 304.49 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/107406.diff


5 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (-33) 
- (modified) llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll (+66-38) 
- (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-ld1.ll (+944-544) 
- (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-ldnt1.ll (+944-544) 
- (modified) llvm/test/CodeGen/AArch64/sve-callee-save-restore-pairs.ll 
(+82-58) 


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
index ba46ededc63a83..87e057a468afd6 100644
--- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
@@ -2931,16 +2931,6 @@ struct RegPairInfo {
 
 } // end anonymous namespace
 
-unsigned findFreePredicateReg(BitVector &SavedRegs) {
-  for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
-if (SavedRegs.test(PReg)) {
-  unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
-  return PNReg;
-}
-  }
-  return AArch64::NoRegister;
-}
-
 static void computeCalleeSaveRegisterPairs(
 MachineFunction &MF, ArrayRef CSI,
 const TargetRegisterInfo *TRI, SmallVectorImpl &RegPairs,
@@ -3645,7 +3635,6 @@ void 
AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
 
   unsigned ExtraCSSpill = 0;
   bool HasUnpairedGPR64 = false;
-  bool HasPairZReg = false;
   // Figure out which callee-saved registers to save/restore.
   for (unsigned i = 0; CSRegs[i]; ++i) {
 const unsigned Reg = CSRegs[i];
@@ -3699,28 +3688,6 @@ void 
AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF,
   !RegInfo->isReservedReg(MF, PairedReg))
 ExtraCSSpill = PairedReg;
 }
-// Check if there is a pair of ZRegs, so it can select PReg for spill/fill
-HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
-SavedRegs.test(CSRegs[i ^ 1]));
-  }
-
-  if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) {
-AArch64FunctionInfo *AFI = MF.getInfo();
-// Find a suitable predicate register for the multi-vector spill/fill
-// instructions.
-unsigned PnReg = findFreePredicateReg(SavedRegs);
-if (PnReg != AArch64::NoRegister)
-  AFI->setPredicateRegForFillSpill(PnReg);
-// If no free callee-save has been found assign one.
-if (!AFI->getPredicateRegForFillSpill() &&
-MF.getFunction().getCallingConv() ==
-CallingConv::AArch64_SVE_VectorCall) {
-  SavedRegs.set(AArch64::P8);
-  AFI->setPredicateRegForFillSpill(AArch64::PN8);
-}
-
-assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) &&
-   "Predicate cannot be a reserved register");
   }
 
   if (MF.getFunction().getCallingConv() == CallingConv::Win64 &&
diff --git a/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll 
b/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll
index 6264ce0cf4ae6d..fa8f92cb0a2c99 100644
--- a/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll
+++ b/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll
@@ -329,27 +329,34 @@ define void @vg_unwind_with_sve_args( 
%x) #0 {
 ; CHECK-NEXT:.cfi_offset w29, -32
 ; CHECK-NEXT:addvl sp, sp, #-18
 ; CHECK-NEXT:.cfi_escape 0x0f, 0x0d, 0x8f, 0x00, 0x11, 0x20, 0x22, 0x11, 
0x90, 0x01, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 32 + 144 * VG
-; CHECK-NEXT:str p8, [sp, #11, mul vl] // 2-byte Folded Spill
-; CHECK-NEXT:ptrue pn8.b
 ; CHECK-NEXT:str p15, [sp, #4, mul vl] // 2-byte Folded Spill
-; CHECK-NEXT:st1b { z22.b, z23.b }, pn8, [sp, #4, mul vl] // 32-byte 
Folded Spill
-; CHECK-NEXT:st1b { z20.b, z21.b }, pn8, [sp, #8, mul vl] // 32-byte 
Folded Spill
 ; CHECK-NEXT:str p14, [sp, #5, mul vl] // 2-byte Folded Spill
-; CHECK-NEXT:st1b { z18.b, z19.b }, pn8, [sp, #12, mul vl] // 32-byte 
Folded Spill
-; CHECK-NEXT:st1b { z16.b, z17.b }, pn8, [sp, #16, mul vl] // 32-byte 
Folded Spill
 ; CHECK-NEXT:str p13, [sp, #6, mul vl] // 2-byte Folded Spill
-; CHECK-NEXT:st1b { z14.b, z15.b }, pn8, [sp, #20, mul vl] // 32-byte 
Folded Spill
-; CHECK-NEXT:st1b { z12.b, z13.b }, pn8, [sp, #24, mul vl] // 32-byte 
Folded Spill
 ; CHECK-NEXT:str p12, [sp, #7, mul vl] // 2-byte Folded Spill
-; CHECK-NEXT:st1b { z10.b, z11.b }, pn8, [sp, #28, mul vl] // 32-byte 
Folded Spill
 ; CHECK-NEXT:str p11, [sp, #8, mul

[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Sander de Smalen via llvm-branch-commits


https://github.com/sdesmalen-arm milestoned 
https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Sander de Smalen via llvm-branch-commits


https://github.com/sdesmalen-arm edited 
https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Paul Walker via llvm-branch-commits


https://github.com/paulwalker-arm approved this pull request.


https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] DAG: Lower fcNormal is.fpclass to compare with inf (PR #100389)

2024-09-05 Thread Matt Arsenault via llvm-branch-commits


https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/100389

>From d51a155348284c6fe453190d87d36f3f63e1f456 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 1 Feb 2023 09:06:59 -0400
Subject: [PATCH] DAG: Lower fcNormal is.fpclass to compare with inf

Looks worse for x86 without the fabs check. Not sure if
this is useful for any targets.
---
 .../CodeGen/SelectionDAG/TargetLowering.cpp   | 25 +++
 1 file changed, 25 insertions(+)

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp 
b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 2b41b8a9a810e5..bd849c996f7ac0 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -8790,6 +8790,31 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, 
SDValue Op,
 IsOrdered ? OrderedOp : UnorderedOp);
   }
 }
+
+if (FPTestMask == fcNormal) {
+  // TODO: Handle unordered
+  ISD::CondCode IsFiniteOp = IsInvertedFP ? ISD::SETUGE : ISD::SETOLT;
+  ISD::CondCode IsNormalOp = IsInvertedFP ? ISD::SETOLT : ISD::SETUGE;
+
+  if (isCondCodeLegalOrCustom(IsFiniteOp,
+  OperandVT.getScalarType().getSimpleVT()) &&
+  isCondCodeLegalOrCustom(IsNormalOp,
+  OperandVT.getScalarType().getSimpleVT()) &&
+  isFAbsFree(OperandVT)) {
+// isnormal(x) --> fabs(x) < infinity && !(fabs(x) < smallest_normal)
+SDValue Inf =
+DAG.getConstantFP(APFloat::getInf(Semantics), DL, OperandVT);
+SDValue SmallestNormal = DAG.getConstantFP(
+APFloat::getSmallestNormalized(Semantics), DL, OperandVT);
+
+SDValue Abs = DAG.getNode(ISD::FABS, DL, OperandVT, Op);
+SDValue IsFinite = DAG.getSetCC(DL, ResultVT, Abs, Inf, IsFiniteOp);
+SDValue IsNormal =
+DAG.getSetCC(DL, ResultVT, Abs, SmallestNormal, IsNormalOp);
+unsigned LogicOp = IsInvertedFP ? ISD::OR : ISD::AND;
+return DAG.getNode(LogicOp, DL, ResultVT, IsFinite, IsNormal);
+  }
+}
   }
 
   // Some checks may be represented as inversion of simpler check, for example

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)

2024-09-05 Thread Amara Emerson via llvm-branch-commits


https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/107406
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/107435

Backport 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c

Requested by: @sdesmalen-arm

>From a943f987d44647f38ae4c5c2d1f69b8f666e16ab Mon Sep 17 00:00:00 2001
From: Sander de Smalen 
Date: Thu, 5 Sep 2024 17:54:57 +0100
Subject: [PATCH] [AArch64] Remove redundant COPY from loadRegFromStackSlot
 (#107396)

This removes a redundant 'COPY' instruction that #81716 probably forgot
to remove.

This redundant COPY led to an issue because because code in
LiveRangeSplitting expects that the instruction emitted by
`loadRegFromStackSlot` is an instruction that accesses memory, which
isn't the case for the COPY instruction.

(cherry picked from commit 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c)
---
 llvm/lib/Target/AArch64/AArch64InstrInfo.cpp |  4 ---
 llvm/test/CodeGen/AArch64/spillfill-sve.mir  | 37 +++-
 2 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 377bcd5868fb64..805684ef69a592 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5144,10 +5144,6 @@ void 
AArch64InstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
   if (PNRReg.isValid() && !PNRReg.isVirtual())
 MI.addDef(PNRReg, RegState::Implicit);
   MI.addMemOperand(MMO);
-
-  if (PNRReg.isValid() && PNRReg.isVirtual())
-BuildMI(MBB, MBBI, DebugLoc(), get(TargetOpcode::COPY), PNRReg)
-.addReg(DestReg);
 }
 
 bool llvm::isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,
diff --git a/llvm/test/CodeGen/AArch64/spillfill-sve.mir 
b/llvm/test/CodeGen/AArch64/spillfill-sve.mir
index 11cf388e385312..83c9b73c575708 100644
--- a/llvm/test/CodeGen/AArch64/spillfill-sve.mir
+++ b/llvm/test/CodeGen/AArch64/spillfill-sve.mir
@@ -11,6 +11,7 @@
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_ppr2mul2() #0 { 
entry: unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_pnr() #1 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_pnr() #1 { 
entry: unreachable }
+  define aarch64_sve_vector_pcs void 
@spills_fills_stack_id_virtreg_ppr_to_pnr() #1 { entry: unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr() #0 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2() #0 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2strided() #0 { 
entry: unreachable }
@@ -216,7 +217,7 @@ body: |
 ; EXPAND: STR_PXI killed renamable $pn8, $sp, 7
 ;
 ; EXPAND: renamable $pn8 = LDR_PXI $sp, 7
-; EXPAND: $p0 = PEXT_PCI_B killed renamable $pn8, 0
+; EXPAND-NEXT: $p0 = PEXT_PCI_B killed renamable $pn8, 0
 
 
 %0:pnr_p8to15 = WHILEGE_CXX_B undef $x0, undef $x0, 0, implicit-def dead 
$nzcv
@@ -242,6 +243,40 @@ body: |
 RET_ReallyLR
 ...
 ---
+name: spills_fills_stack_id_virtreg_ppr_to_pnr
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: ppr }
+  - { id: 1, class: pnr_p8to15 }
+stack:
+body: |
+  bb.0.entry:
+liveins: $p0
+
+%0:ppr = COPY $p0
+
+$pn0 = IMPLICIT_DEF
+$pn1 = IMPLICIT_DEF
+$pn2 = IMPLICIT_DEF
+$pn3 = IMPLICIT_DEF
+$pn4 = IMPLICIT_DEF
+$pn5 = IMPLICIT_DEF
+$pn6 = IMPLICIT_DEF
+$pn7 = IMPLICIT_DEF
+$pn8 = IMPLICIT_DEF
+$pn9 = IMPLICIT_DEF
+$pn10 = IMPLICIT_DEF
+$pn11 = IMPLICIT_DEF
+$pn12 = IMPLICIT_DEF
+$pn13 = IMPLICIT_DEF
+$pn14 = IMPLICIT_DEF
+$pn15 = IMPLICIT_DEF
+
+%1:pnr_p8to15 = COPY %0
+$p0 = PEXT_PCI_B %1, 0
+RET_ReallyLR
+...
+---
 name: spills_fills_stack_id_zpr
 tracksRegLiveness: true
 registers:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread via llvm-branch-commits


llvmbot wrote:




@llvm/pr-subscribers-backend-aarch64

Author: None (llvmbot)


Changes

Backport 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c

Requested by: @sdesmalen-arm

---
Full diff: https://github.com/llvm/llvm-project/pull/107435.diff


2 Files Affected:

- (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (-4) 
- (modified) llvm/test/CodeGen/AArch64/spillfill-sve.mir (+36-1) 


``diff
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp 
b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index 377bcd5868fb64..805684ef69a592 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5144,10 +5144,6 @@ void 
AArch64InstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
   if (PNRReg.isValid() && !PNRReg.isVirtual())
 MI.addDef(PNRReg, RegState::Implicit);
   MI.addMemOperand(MMO);
-
-  if (PNRReg.isValid() && PNRReg.isVirtual())
-BuildMI(MBB, MBBI, DebugLoc(), get(TargetOpcode::COPY), PNRReg)
-.addReg(DestReg);
 }
 
 bool llvm::isNZCVTouchedInInstructionRange(const MachineInstr &DefMI,
diff --git a/llvm/test/CodeGen/AArch64/spillfill-sve.mir 
b/llvm/test/CodeGen/AArch64/spillfill-sve.mir
index 11cf388e385312..83c9b73c575708 100644
--- a/llvm/test/CodeGen/AArch64/spillfill-sve.mir
+++ b/llvm/test/CodeGen/AArch64/spillfill-sve.mir
@@ -11,6 +11,7 @@
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_ppr2mul2() #0 { 
entry: unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_pnr() #1 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_pnr() #1 { 
entry: unreachable }
+  define aarch64_sve_vector_pcs void 
@spills_fills_stack_id_virtreg_ppr_to_pnr() #1 { entry: unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr() #0 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2() #0 { entry: 
unreachable }
   define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2strided() #0 { 
entry: unreachable }
@@ -216,7 +217,7 @@ body: |
 ; EXPAND: STR_PXI killed renamable $pn8, $sp, 7
 ;
 ; EXPAND: renamable $pn8 = LDR_PXI $sp, 7
-; EXPAND: $p0 = PEXT_PCI_B killed renamable $pn8, 0
+; EXPAND-NEXT: $p0 = PEXT_PCI_B killed renamable $pn8, 0
 
 
 %0:pnr_p8to15 = WHILEGE_CXX_B undef $x0, undef $x0, 0, implicit-def dead 
$nzcv
@@ -242,6 +243,40 @@ body: |
 RET_ReallyLR
 ...
 ---
+name: spills_fills_stack_id_virtreg_ppr_to_pnr
+tracksRegLiveness: true
+registers:
+  - { id: 0, class: ppr }
+  - { id: 1, class: pnr_p8to15 }
+stack:
+body: |
+  bb.0.entry:
+liveins: $p0
+
+%0:ppr = COPY $p0
+
+$pn0 = IMPLICIT_DEF
+$pn1 = IMPLICIT_DEF
+$pn2 = IMPLICIT_DEF
+$pn3 = IMPLICIT_DEF
+$pn4 = IMPLICIT_DEF
+$pn5 = IMPLICIT_DEF
+$pn6 = IMPLICIT_DEF
+$pn7 = IMPLICIT_DEF
+$pn8 = IMPLICIT_DEF
+$pn9 = IMPLICIT_DEF
+$pn10 = IMPLICIT_DEF
+$pn11 = IMPLICIT_DEF
+$pn12 = IMPLICIT_DEF
+$pn13 = IMPLICIT_DEF
+$pn14 = IMPLICIT_DEF
+$pn15 = IMPLICIT_DEF
+
+%1:pnr_p8to15 = COPY %0
+$p0 = PEXT_PCI_B %1, 0
+RET_ReallyLR
+...
+---
 name: spills_fills_stack_id_zpr
 tracksRegLiveness: true
 registers:

``




https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [BOLT] Add pseudo probe inline tree to YAML profile (PR #107137)

2024-09-05 Thread Amir Ayupov via llvm-branch-commits

aaupov wrote:

> Didn't realize yaml profile currently doesn't have probe inline tree encoded. 
> This can increase profile size a bit, just making sure that's not a concern 
> for yaml profile.

Good call. Including probe inline tree does increase the size of yaml profile 
by about 2x pre- and post-compression (221M -> 404M, 20M -> 41M resp.) for a 
large binary, which also causes a 2x profile reading time increase.

Let me experiment with reorganizing the data to reduce the size.

https://github.com/llvm/llvm-project/pull/107137
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin edited 
https://github.com/llvm/llvm-project/pull/107329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [clang] Backport "[Clang][CodeGen] Fix type for atomic float incdec operators (#107075)" (PR #107184)

2024-09-05 Thread Simon Pilgrim via llvm-branch-commits


https://github.com/RKSimon approved this pull request.


https://github.com/llvm/llvm-project/pull/107184
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread Amara Emerson via llvm-branch-commits


https://github.com/aemerson approved this pull request.


https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)

2024-09-05 Thread Amara Emerson via llvm-branch-commits


aemerson wrote:

To justify this for the 19 release: this is easily triggered by small IR so we 
should take this.

https://github.com/llvm/llvm-project/pull/107435
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin updated 
https://github.com/llvm/llvm-project/pull/107329

>From 856568c07d924dd59aaa81450cb8bcb64d60d2eb Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Tue, 3 Sep 2024 21:28:05 -0700
Subject: [PATCH] [ctx_prof] Flattened profile lowering pass

---
 llvm/include/llvm/ProfileData/ProfileCommon.h |   6 +-
 .../Instrumentation/PGOCtxProfFlattening.h|  25 ++
 llvm/lib/Passes/PassBuilder.cpp   |   1 +
 llvm/lib/Passes/PassBuilderPipelines.cpp  |   1 +
 llvm/lib/Passes/PassRegistry.def  |   1 +
 .../Transforms/Instrumentation/CMakeLists.txt |   1 +
 .../Instrumentation/PGOCtxProfFlattening.cpp  | 341 ++
 .../flatten-always-removes-instrumentation.ll |  12 +
 .../CtxProfAnalysis/flatten-and-annotate.ll   | 112 ++
 9 files changed, 497 insertions(+), 3 deletions(-)
 create mode 100644 
llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h
 create mode 100644 llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp
 create mode 100644 
llvm/test/Analysis/CtxProfAnalysis/flatten-always-removes-instrumentation.ll
 create mode 100644 llvm/test/Analysis/CtxProfAnalysis/flatten-and-annotate.ll

diff --git a/llvm/include/llvm/ProfileData/ProfileCommon.h 
b/llvm/include/llvm/ProfileData/ProfileCommon.h
index eaab59484c947a..edd8e1f644ad12 100644
--- a/llvm/include/llvm/ProfileData/ProfileCommon.h
+++ b/llvm/include/llvm/ProfileData/ProfileCommon.h
@@ -79,13 +79,13 @@ class ProfileSummaryBuilder {
 class InstrProfSummaryBuilder final : public ProfileSummaryBuilder {
   uint64_t MaxInternalBlockCount = 0;
 
-  inline void addEntryCount(uint64_t Count);
-  inline void addInternalCount(uint64_t Count);
-
 public:
   InstrProfSummaryBuilder(std::vector Cutoffs)
   : ProfileSummaryBuilder(std::move(Cutoffs)) {}
 
+  void addEntryCount(uint64_t Count);
+  void addInternalCount(uint64_t Count);
+
   void addRecord(const InstrProfRecord &);
   std::unique_ptr getSummary();
 };
diff --git 
a/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h 
b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h
new file mode 100644
index 00..0eab3aaf6fcad3
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h
@@ -0,0 +1,25 @@
+//===-- PGOCtxProfFlattening.h - Contextual Instr. Flattening ---*- C++ 
-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// This file declares the PGOCtxProfFlattening class.
+//
+//===--===//
+#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_PGOCTXPROFFLATTENING_H
+#define LLVM_TRANSFORMS_INSTRUMENTATION_PGOCTXPROFFLATTENING_H
+
+#include "llvm/IR/PassManager.h"
+namespace llvm {
+
+class PGOCtxProfFlatteningPass
+: public PassInfoMixin {
+public:
+  explicit PGOCtxProfFlatteningPass() = default;
+  PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM);
+};
+} // namespace llvm
+#endif
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index a22abed8051a11..d87e64eff08966 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -198,6 +198,7 @@
 #include "llvm/Transforms/Instrumentation/MemProfiler.h"
 #include "llvm/Transforms/Instrumentation/MemorySanitizer.h"
 #include "llvm/Transforms/Instrumentation/NumericalStabilitySanitizer.h"
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 1fd7ef929c87d5..38297dc02b8be6 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -76,6 +76,7 @@
 #include "llvm/Transforms/Instrumentation/InstrOrderFile.h"
 #include "llvm/Transforms/Instrumentation/InstrProfiling.h"
 #include "llvm/Transforms/Instrumentation/MemProfiler.h"
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
 #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h"
 #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h"
 #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index d6067089c6b5c1..2b0624cb9874da 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -58,6 +58,7 @@ MODULE_PASS("coro-early", CoroEarlyPass())
 MODULE_PASS("cross-dso-cfi", CrossDSOCFIPass())
 MODULE_PASS("ctx-instr-gen",
 PGOInstrumentationGen(PGOInstrum

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,341 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)

david-xl wrote:

Why can E be null?

https://github.com/llvm/llvm-project/pull/107329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,341 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {

david-xl wrote:

nit: take-> compute

https://github.com/llvm/llvm-project/pull/107329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,341 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/ScopeExit.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,

david-xl wrote:

brief document when to assumeAllKnown.

https://github.com/llvm/llvm-project/pull/107329
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)

2024-09-05 Thread David Li via llvm-branch-commits



@@ -0,0 +1,333 @@
+//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening 
===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+//
+// Flattens the contextual profile and lowers it to MD_prof.
+// This should happen after all IPO (which is assumed to have maintained the
+// contextual profile) happened. Flattening consists of summing the values at
+// the same index of the counters belonging to all the contexts of a function.
+// The lowering consists of materializing the counter values to function
+// entrypoint counts and branch probabilities.
+//
+// This pass also removes contextual instrumentation, which has been kept 
around
+// to facilitate its functionality.
+//
+//===--===//
+
+#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Analysis/CtxProfAnalysis.h"
+#include "llvm/Analysis/OptimizationRemarkEmitter.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/IR/Analysis.h"
+#include "llvm/IR/CFG.h"
+#include "llvm/IR/Dominators.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/IR/ProfileSummary.h"
+#include "llvm/ProfileData/ProfileCommon.h"
+#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h"
+#include "llvm/Transforms/Scalar/DCE.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+using namespace llvm;
+
+namespace {
+
+class ProfileAnnotator final {
+  class BBInfo;
+  struct EdgeInfo {
+BBInfo *const Src;
+BBInfo *const Dest;
+std::optional Count;
+
+explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {}
+  };
+
+  class BBInfo {
+std::optional Count;
+SmallVector OutEdges;
+SmallVector InEdges;
+size_t UnknownCountOutEdges = 0;
+size_t UnknownCountInEdges = 0;
+
+uint64_t getEdgeSum(const SmallVector &Edges,
+bool AssumeAllKnown) const {
+  uint64_t Sum = 0;
+  for (const auto *E : Edges)
+if (E)
+  Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U);
+  return Sum;
+}
+
+void takeCountFrom(const SmallVector &Edges) {
+  assert(!Count.has_value());
+  Count = getEdgeSum(Edges, true);
+}
+
+void setSingleUnknownEdgeCount(SmallVector &Edges) {
+  uint64_t KnownSum = getEdgeSum(Edges, false);
+  uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U;
+  EdgeInfo *E = nullptr;
+  for (auto *I : Edges)
+if (I && !I->Count.has_value()) {
+  E = I;
+#ifdef NDEBUG
+  break;
+#else
+  assert((!E || E == I) &&
+ "Expected exactly one edge to have an unknown count, "
+ "found a second one");
+  continue;
+#endif
+}
+  assert(E && "Expected exactly one edge to have an unknown count");
+  assert(!E->Count.has_value());
+  E->Count = EdgeVal;
+  assert(E->Src->UnknownCountOutEdges > 0);
+  assert(E->Dest->UnknownCountInEdges > 0);
+  --E->Src->UnknownCountOutEdges;
+  --E->Dest->UnknownCountInEdges;
+}
+
+  public:
+BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional 
Count)
+: Count(Count) {
+  InEdges.reserve(NumInEdges);
+  OutEdges.resize(NumOutEdges);
+}
+
+bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) {
+  if (!succ_empty(&BB) && !UnknownCountOutEdges) {
+takeCountFrom(OutEdges);
+return true;
+  }
+  return false;
+}
+
+bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) {
+  if (!BB.isEntryBlock() && !UnknownCountInEdges) {
+takeCountFrom(InEdges);
+return true;
+  }
+  return false;
+}
+
+void addInEdge(EdgeInfo *Info) {
+  InEdges.push_back(Info);
+  ++UnknownCountInEdges;
+}
+
+void addOutEdge(size_t Index, EdgeInfo *Info) {
+  OutEdges[Index] = Info;
+  ++UnknownCountOutEdges;
+}
+
+bool hasCount() const { return Count.has_value(); }
+
+bool trySetSingleUnknownInEdgeCount() {
+  if (UnknownCountInEdges == 1) {
+setSingleUnknownEdgeCount(InEdges);
+return true;
+  }
+  return false;
+}
+
+bool trySetSingleUnknownOutEdgeCount() {
+  if (UnknownCountOutEdges == 1) {
+setSingleUnknownEdgeCount(OutEdges);
+return true;
+  }
+  return false;
+}
+size_t getNumOutEdges() const { return OutEdges.size(); }
+
+uint64_t getEdgeCount(size_t Index) const {
+  if (auto *E = OutEdges[Index])
+return *E->Count;
+  return 0U;
+}
+  };
+
+  F

[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)

2024-09-05 Thread via llvm-branch-commits


https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/107466

Backport 2e0ded3371f8d42f376bdfd4d70687537e36818e

Requested by: @jrtc27

>From 66968ee023596c5674d1e94c98e8d7d2a069a69c Mon Sep 17 00:00:00 2001
From: R-Goc <131907007+r-...@users.noreply.github.com>
Date: Wed, 4 Sep 2024 20:10:36 +0200
Subject: [PATCH] [Windows SEH] Fix crash on empty seh block (#107031)

Fixes https://github.com/llvm/llvm-project/issues/105813 and
https://github.com/llvm/llvm-project/issues/106915.
Adds a check for the end of the iterator, which can be a sentinel.
The issue was introduced in
https://github.com/llvm/llvm-project/commit/0efe111365ae176671e01252d24028047d807a84
from what I can see, so along with the introduction of /EHa support.

(cherry picked from commit 2e0ded3371f8d42f376bdfd4d70687537e36818e)
---
 .../CodeGen/SelectionDAG/SelectionDAGISel.cpp  |  4 
 .../CodeGen/WinEH/wineh-empty-seh-scope.ll | 18 ++
 2 files changed, 22 insertions(+)
 create mode 100644 llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp 
b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
index df3d207d85d351..b961d3bb1fec7f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
@@ -1453,6 +1453,10 @@ void 
SelectionDAGISel::reportIPToStateForBlocks(MachineFunction *MF) {
 if (BB->getFirstMayFaultInst()) {
   // Report IP range only for blocks with Faulty inst
   auto MBBb = MBB.getFirstNonPHI();
+
+  if (MBBb == MBB.end())
+continue;
+
   MachineInstr *MIb = &*MBBb;
   if (MIb->isTerminator())
 continue;
diff --git a/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll 
b/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll
new file mode 100644
index 00..5f382f10f180bc
--- /dev/null
+++ b/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll
@@ -0,0 +1,18 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=x86_64-pc-windows-msvc19.41.34120 < %s | FileCheck %s
+
+define void @foo() personality ptr @__CxxFrameHandler3 {
+; CHECK-LABEL: foo:
+; CHECK:   # %bb.0:
+; CHECK-NEXT:nop # avoids zero-length function
+  call void @llvm.seh.scope.begin()
+  unreachable
+}
+
+declare i32 @__CxxFrameHandler3(...)
+
+declare void @llvm.seh.scope.begin()
+
+!llvm.module.flags = !{!0}
+
+!0 = !{i32 2, !"eh-asynch", i32 1}

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)

2024-09-05 Thread via llvm-branch-commits


https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/107466
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)

2024-09-05 Thread via llvm-branch-commits


llvmbot wrote:

@arsenm What do you think about merging this PR to the release branch?

https://github.com/llvm/llvm-project/pull/107466
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Insert the ctx prof flattener after the module inliner (PR #107499)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits


https://github.com/mtrofin created 
https://github.com/llvm/llvm-project/pull/107499

None

>From e90265db97747c0b15f81b31f061e299ffd33138 Mon Sep 17 00:00:00 2001
From: Mircea Trofin 
Date: Thu, 5 Sep 2024 12:52:56 -0700
Subject: [PATCH] [ctx_prof] Insert the ctx prof flattener after the module
 inliner

---
 llvm/lib/Passes/PassBuilderPipelines.cpp | 18 +-
 llvm/test/Analysis/CtxProfAnalysis/inline.ll | 17 +
 llvm/test/Other/opt-hot-cold-split.ll|  2 +-
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp 
b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 38297dc02b8be6..f9b5f584e00c07 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1017,6 +1017,11 @@ 
PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
   IP.EnableDeferral = false;
 
   MPM.addPass(ModuleInlinerPass(IP, UseInlineAdvisor, Phase));
+  if (!UseCtxProfile.empty()) {
+MPM.addPass(GlobalOptPass());
+MPM.addPass(GlobalDCEPass());
+MPM.addPass(PGOCtxProfFlatteningPass());
+  }
 
   MPM.addPass(createModuleToFunctionPassAdaptor(
   buildFunctionSimplificationPipeline(Level, Phase),
@@ -1744,11 +1749,14 @@ ModulePassManager 
PassBuilder::buildThinLTODefaultPipeline(
 MPM.addPass(GlobalDCEPass());
 return MPM;
   }
-
-  // Add the core simplification pipeline.
-  MPM.addPass(buildModuleSimplificationPipeline(
-  Level, ThinOrFullLTOPhase::ThinLTOPostLink));
-
+  if (!UseCtxProfile.empty()) {
+MPM.addPass(
+buildModuleInlinerPipeline(Level, 
ThinOrFullLTOPhase::ThinLTOPostLink));
+  } else {
+// Add the core simplification pipeline.
+MPM.addPass(buildModuleSimplificationPipeline(
+Level, ThinOrFullLTOPhase::ThinLTOPostLink));
+  }
   // Now add the optimization pipeline.
   MPM.addPass(buildModuleOptimizationPipeline(
   Level, ThinOrFullLTOPhase::ThinLTOPostLink));
diff --git a/llvm/test/Analysis/CtxProfAnalysis/inline.ll 
b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
index 875bc4938653b9..9381418c4e3f12 100644
--- a/llvm/test/Analysis/CtxProfAnalysis/inline.ll
+++ b/llvm/test/Analysis/CtxProfAnalysis/inline.ll
@@ -31,6 +31,23 @@
 ; CHECK-NEXT:%call2 = call i32 @a(i32 %x) #1
 ; CHECK-NEXT:br label %exit
 
+; Make sure the postlink thinlto pipeline is aware of ctxprof
+; RUN: opt -passes='thinlto' -use-ctx-profile=%t/profile.ctxprofdata \
+; RUN:   %t/module.ll -S -o - | FileCheck %s --check-prefix=PIPELINE
+
+; PIPELINE-LABEL: define i32 @entrypoint
+; PIPELINE-SAME: !prof ![[ENTRYPOINT_COUNT:[0-9]+]]
+; PIPELINE-LABEL: loop.i:
+; PIPELINE: br i1 %cond.i, label %loop.i, label %exit, !prof 
![[LOOP_BW_INL:[0-9]+]]
+; PIPELINE-LABEL: define i32 @a
+; PIPELINE-LABEL: loop:
+; PIPELINE: br i1 %cond, label %loop, label %exit, !prof 
![[LOOP_BW_ORIG:[0-9]+]]
+
+; PIPELINE: ![[ENTRYPOINT_COUNT]] = !{!"function_entry_count", i64 10}
+; These are the weights of the inlined @a, where the counters were 2, 100 (2 
for entry, 100 for loop)
+; PIPELINE: ![[LOOP_BW_INL]] = !{!"branch_weights", i32 98, i32 2}
+; These are the weights of the un-inlined @a, where the counters were 8, 500 
(8 for entry, 500 for loop)
+; PIPELINE: ![[LOOP_BW_ORIG]] = !{!"branch_weights", i32 492, i32 8}
 
 ;--- module.ll
 define i32 @entrypoint(i32 %x) !guid !0 {
diff --git a/llvm/test/Other/opt-hot-cold-split.ll 
b/llvm/test/Other/opt-hot-cold-split.ll
index 21c713d35bb746..cd290dcc306570 100644
--- a/llvm/test/Other/opt-hot-cold-split.ll
+++ b/llvm/test/Other/opt-hot-cold-split.ll
@@ -2,7 +2,7 @@
 ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto-pre-link' 
-debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s 
-check-prefix=LTO-PRELINK-Os
 ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true 
-passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | 
FileCheck %s -check-prefix=THINLTO-PRELINK-Os
 ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto' 
-debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s 
-check-prefix=LTO-POSTLINK-Os
-; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto' 
-debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s 
-check-prefix=THINLTO-POSTLINK-Os
+; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -LINK-Os
 
 ; REQUIRES: asserts
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

[llvm-branch-commits] [llvm] [ctx_prof] Insert the ctx prof flattener after the module inliner (PR #107499)

2024-09-05 Thread Mircea Trofin via llvm-branch-commits


mtrofin wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/107499?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#107499** https://app.graphite.dev/github/pr/llvm/llvm-project/107499?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#107329** https://app.graphite.dev/github/pr/llvm/llvm-project/107329?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* **#107463** https://app.graphite.dev/github/pr/llvm/llvm-project/107463?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @mtrofin and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/107499
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

46 matches

Mail list logo