[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)
@@ -0,0 +1,30 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=x86_64 -verify-machineinstrs < %s -relocation-model=pic | FileCheck %s ritter-x2a wrote: I think -verify-machineinstrs is useful here: Without my patch, the test fails in the MachineVerifier, where the call stack pseudos are checked. Without -verify-machineinstrs, this would only happen in builds with expensive checks enabled, and the test would be ineffective for other builds. https://github.com/llvm/llvm-project/pull/106965 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/106965 >From c332034894c9fa3de26daedb28a977a3580dc4d8 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Mon, 2 Sep 2024 05:37:33 -0400 Subject: [PATCH] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments When a pointer to thread-local storage is passed in a function call, ISel first lowers the call and wraps the resulting code in CALLSEQ markers. Afterwards, to compute the pointer to TLS, a call to retrieve the TLS base address is generated and then wrapped in a set of CALLSEQ markers. If the latter call is inserted into the call sequence of the former call, this leads to nested call frames, which are illegal and lead to errors in the machine verifier. This patch avoids surrounding the call to compute the TLS base address in CALLSEQ markers if it is already surrounded by such markers. It relies on zero-sized call frames being represented in the call frame size info stored in the MachineBBs. Fixes #45574 and #98042. --- llvm/lib/Target/X86/X86ISelLowering.cpp | 7 + .../test/CodeGen/X86/tls-function-argument.ll | 30 +++ 2 files changed, 37 insertions(+) create mode 100644 llvm/test/CodeGen/X86/tls-function-argument.ll diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index bbee0af109c74b..bf9777888df831 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35593,6 +35593,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI, // inside MC, therefore without the two markers shrink-wrapping // may push the prologue/epilogue pass them. const TargetInstrInfo &TII = *Subtarget.getInstrInfo(); + + // Do not introduce CALLSEQ markers if we are already in a call sequence. + // Nested call sequences are not allowed and cause errors in the machine + // verifier. + if (TII.getCallFrameSizeAt(MI).has_value()) +return BB; + const MIMetadata MIMD(MI); MachineFunction &MF = *BB->getParent(); diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll b/llvm/test/CodeGen/X86/tls-function-argument.ll new file mode 100644 index 00..9b6ab529db3ea3 --- /dev/null +++ b/llvm/test/CodeGen/X86/tls-function-argument.ll @@ -0,0 +1,30 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=x86_64 -verify-machineinstrs -relocation-model=pic < %s | FileCheck %s + +; Passing a pointer to thread-local storage to a function can be problematic +; since computing such addresses requires a function call that is introduced +; very late in instruction selection. We need to ensure that we don't introduce +; nested call sequence markers if this function call happens in a call sequence. + +@TLS = internal thread_local global i64 zeroinitializer, align 8 +declare void @bar(ptr) +define internal void @foo() { +; CHECK-LABEL: foo: +; CHECK: # %bb.0: +; CHECK-NEXT:pushq %rbx +; CHECK-NEXT:.cfi_def_cfa_offset 16 +; CHECK-NEXT:.cfi_offset %rbx, -16 +; CHECK-NEXT:leaq TLS@TLSLD(%rip), %rdi +; CHECK-NEXT:callq __tls_get_addr@PLT +; CHECK-NEXT:leaq TLS@DTPOFF(%rax), %rbx +; CHECK-NEXT:movq %rbx, %rdi +; CHECK-NEXT:callq bar@PLT +; CHECK-NEXT:movq %rbx, %rdi +; CHECK-NEXT:callq bar@PLT +; CHECK-NEXT:popq %rbx +; CHECK-NEXT:.cfi_def_cfa_offset 8 +; CHECK-NEXT:retq + call void @bar(ptr @TLS) + call void @bar(ptr @TLS) + ret void +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)
ritter-x2a wrote: > This sounds sketchy to me. Is it really valid to enter a second call inside > another call's CALLSEQ markers, but only if we avoid adding a second nested > set of markers? It feels like attacking the symptom of the issue, but not the > root cause. (I'm not certain it's _not_ valid, but it just seems really > suspicious...) >From what I've gathered from the source comments and the >[patch](https://github.com/llvm/llvm-project/commit/228978c0dcfc9a9793f3dc8a69f42471192223bc) > introducing the code that inserts these CALLSEQ markers for TLSADDRs, their >only point here is to stop shrink-wrapping from moving the function >prologue/epilogue past the call to get the TLS address. This should also be >given when the TLSADDR is in another CALLSEQ. I am however by no means an expert on this topic; I'd appreciate more insights on which uses of CALLSEQ markers are and are not valid (besides the MachineVerifier checks). https://github.com/llvm/llvm-project/pull/106965 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)
https://github.com/rorth created https://github.com/llvm/llvm-project/pull/107362 As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when enabled): ``` Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c Profile-sparc :: ContinuousSyncMode/set-file-object.c ``` The Solaris linker provides the crucial clue as to what's wrong: ``` ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes: (file runtime-counter-relocation-17ff25.o value=0x8; file libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4); runtime-counter-relocation-17ff25.o definition taken ``` In fact, the types in `llvm` and `compiler-rt` differ: - `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` (`InstrLowerer::getCounterAddress`) as `int64_t`, while `compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this doesn't matter in the 64-bit case, the type sizes differ for 32-bit. This patch changes the `compiler-rt` type to match `llvm`. At the same time, the affected testcases are enabled on Solaris, too, where they now just `PASS`. Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, `x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11. This is a backport of PR #102747, adjusted for the lack of `__llvm_profile_bitmap_bias` on the `release/19.x` branch. >From 086147ec9d3428b6abe137f1d7ac7aa17aa8a715 Mon Sep 17 00:00:00 2001 From: Rainer Orth Date: Thu, 5 Sep 2024 09:51:08 +0200 Subject: [PATCH] [profile] Change __llvm_profile_counter_bias type to match llvm As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when enabled): ``` Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c Profile-sparc :: ContinuousSyncMode/set-file-object.c ``` The Solaris linker provides the crucial clue as to what's wrong: ``` ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes: (file runtime-counter-relocation-17ff25.o value=0x8; file libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4); runtime-counter-relocation-17ff25.o definition taken ``` In fact, the types in `llvm` and `compiler-rt` differ: - `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` (`InstrLowerer::getCounterAddress`) as `int64_t`, while `compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this doesn't matter in the 64-bit case, the type sizes differ for 32-bit. This patch changes the `compiler-rt` type to match `llvm`. At the same time, the affected testcases are enabled on Solaris, too, where they now just `PASS`. Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, `x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11. This is a backport of PR #102747, adjusted for the lack of `__llvm_profile_bitmap_bias` on the `release/19.x` branch. --- compiler-rt/lib/profile/InstrProfilingFile.c| 6 +++--- compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c | 2 +- .../profile/ContinuousSyncMode/runtime-counter-relocation.c | 2 +- .../test/profile/ContinuousSyncMode/set-file-object.c | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 1c58584d2d4f73..3bb2ae068305c9 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -198,12 +198,12 @@ static int mmapForContinuousMode(uint64_t CurrentFileOffset, FILE *File) { #define INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR \ INSTR_PROF_CONCAT(INSTR_PROF_PROFILE_COUNTER_BIAS_VAR, _default) -COMPILER_RT_VISIBILITY intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0; +COMPILER_RT_VISIBILITY int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0; /* This variable is a weak external reference which could be used to detect * whether or not the compiler defined this symbol. */ #if defined(_MSC_VER) -COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; +COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; #if defined(_M_IX86) || defined(__i386__) #define WIN_SYM_PREFIX "_" #else @@ -214,7 +214,7 @@ COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; INSTR_PROF_PROFILE_COUNTER_BIAS_VAR) "=" WIN_SYM_PREFIX \ INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR)) #else -COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR +COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR __attribute__((weak, alias(INSTR_PROF_QUOTE(
[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)
https://github.com/rorth milestoned https://github.com/llvm/llvm-project/pull/107362 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)
https://github.com/rorth edited https://github.com/llvm/llvm-project/pull/107362 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [compiler-rt] [profile] Change __llvm_profile_counter_bias type to match llvm (PR #107362)
llvmbot wrote: @llvm/pr-subscribers-pgo Author: Rainer Orth (rorth) Changes As detailed in Issue #101667, two `profile` tests `FAIL` on 32-bit SPARC, both Linux/sparc64 and Solaris/sparcv9 (where the tests work when enabled): ``` Profile-sparc :: ContinuousSyncMode/runtime-counter-relocation.c Profile-sparc :: ContinuousSyncMode/set-file-object.c ``` The Solaris linker provides the crucial clue as to what's wrong: ``` ld: warning: symbol '__llvm_profile_counter_bias' has differing sizes: (file runtime-counter-relocation-17ff25.o value=0x8; file libclang_rt.profile-sparc.a(InstrProfilingFile.c.o) value=0x4); runtime-counter-relocation-17ff25.o definition taken ``` In fact, the types in `llvm` and `compiler-rt` differ: - `__llvm_profile_counter_bias`/`INSTR_PROF_PROFILE_COUNTER_BIAS_VAR` is created in `llvm/lib/Transforms/Instrumentation/InstrProfiling.cpp` (`InstrLowerer::getCounterAddress`) as `int64_t`, while `compiler-rt/lib/profile/InstrProfilingFile.c` uses `intptr_t`. While this doesn't matter in the 64-bit case, the type sizes differ for 32-bit. This patch changes the `compiler-rt` type to match `llvm`. At the same time, the affected testcases are enabled on Solaris, too, where they now just `PASS`. Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, `x86_64-pc-linux-gnu`, and `amd64-pc-solaris2.11. This is a backport of PR #102747, adjusted for the lack of `__llvm_profile_bitmap_bias` on the `release/19.x` branch. --- Full diff: https://github.com/llvm/llvm-project/pull/107362.diff 4 Files Affected: - (modified) compiler-rt/lib/profile/InstrProfilingFile.c (+3-3) - (modified) compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c (+1-1) - (modified) compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c (+1-1) - (modified) compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c (+1-1) ``diff diff --git a/compiler-rt/lib/profile/InstrProfilingFile.c b/compiler-rt/lib/profile/InstrProfilingFile.c index 1c58584d2d4f73..3bb2ae068305c9 100644 --- a/compiler-rt/lib/profile/InstrProfilingFile.c +++ b/compiler-rt/lib/profile/InstrProfilingFile.c @@ -198,12 +198,12 @@ static int mmapForContinuousMode(uint64_t CurrentFileOffset, FILE *File) { #define INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR \ INSTR_PROF_CONCAT(INSTR_PROF_PROFILE_COUNTER_BIAS_VAR, _default) -COMPILER_RT_VISIBILITY intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0; +COMPILER_RT_VISIBILITY int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR = 0; /* This variable is a weak external reference which could be used to detect * whether or not the compiler defined this symbol. */ #if defined(_MSC_VER) -COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; +COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; #if defined(_M_IX86) || defined(__i386__) #define WIN_SYM_PREFIX "_" #else @@ -214,7 +214,7 @@ COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; INSTR_PROF_PROFILE_COUNTER_BIAS_VAR) "=" WIN_SYM_PREFIX \ INSTR_PROF_QUOTE(INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR)) #else -COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR +COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR __attribute__((weak, alias(INSTR_PROF_QUOTE( INSTR_PROF_PROFILE_COUNTER_BIAS_DEFAULT_VAR; #endif diff --git a/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c b/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c index fdcb82e4d72baf..65b7bdaf403da4 100644 --- a/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c +++ b/compiler-rt/lib/profile/InstrProfilingPlatformFuchsia.c @@ -35,7 +35,7 @@ #include "InstrProfilingUtil.h" /* This variable is an external reference to symbol defined by the compiler. */ -COMPILER_RT_VISIBILITY extern intptr_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; +COMPILER_RT_VISIBILITY extern int64_t INSTR_PROF_PROFILE_COUNTER_BIAS_VAR; COMPILER_RT_VISIBILITY unsigned lprofProfileDumped(void) { return 1; diff --git a/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c b/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c index 4ca8bf62455371..19a7aae70cb0d3 100644 --- a/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c +++ b/compiler-rt/test/profile/ContinuousSyncMode/runtime-counter-relocation.c @@ -1,4 +1,4 @@ -// REQUIRES: linux || windows +// REQUIRES: target={{.*(linux|solaris|windows-msvc).*}} // RUN: %clang -fprofile-instr-generate -fcoverage-mapping -mllvm -runtime-counter-relocation=true -o %t.exe %s // RUN: echo "garbage" > %t.profraw diff --git a/compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c b/compiler-rt/test/profile/ContinuousSyncMode/set-file-object.c index b52324d7091eb2..53609f5838f753
[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document clause-based op representation (PR #107234)
https://github.com/tblah approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/107234 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document loop representation (PR #107235)
https://github.com/tblah approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/107235 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)
https://github.com/tblah approved this pull request. LGTM, thank you for writing all of these https://github.com/llvm/llvm-project/pull/107236 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)
https://github.com/skatrak updated https://github.com/llvm/llvm-project/pull/107236 >From da68c8b8be9588251bb4342e869a52035fc45a8e Mon Sep 17 00:00:00 2001 From: Sergio Afonso Date: Wed, 4 Sep 2024 13:21:22 +0100 Subject: [PATCH 1/2] [MLIR][OpenMP] NFC: Document compound constructs representation This patch documents the MLIR representation of OpenMP compound constructs discussed in [this](https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986) and [this](https://discourse.llvm.org/t/rfc-disambiguation-between-loop-and-block-associated-omp-parallelop/79972) RFC. --- mlir/docs/Dialects/OpenMPDialect/_index.md | 114 + 1 file changed, 114 insertions(+) diff --git a/mlir/docs/Dialects/OpenMPDialect/_index.md b/mlir/docs/Dialects/OpenMPDialect/_index.md index 65b9c5d79f73e9..28ebb1fe3cf3f8 100644 --- a/mlir/docs/Dialects/OpenMPDialect/_index.md +++ b/mlir/docs/Dialects/OpenMPDialect/_index.md @@ -287,3 +287,117 @@ been implemented, but it is closely linked to the `omp.canonical_loop` work. Nevertheless, loop transformation that the `collapse` clause for loop-associated worksharing constructs defines can be represented by introducing multiple bounds, step and induction variables to the `omp.loop_nest` operation. + +## Compound Construct Representation + +The OpenMP specification defines certain shortcuts that allow specifying +multiple constructs in a single directive, which are referred to as compound +constructs (e.g. `parallel do` contains the `parallel` and `do` constructs). +These can be further classified into [combined](#combined-constructs) and +[composite](#composite-constructs) constructs. This section describes how they +are represented in the dialect. + +When clauses are specified for compound constructs, the OpenMP specification +defines a set of rules to decide to which leaf constructs they apply, as well as +potentially introducing some other implicit clauses. These rules must be taken +into account by those creating the MLIR representation, since it is a per-leaf +representation that expects these rules to have already been followed. + +### Combined Constructs + +Combined constructs are semantically equivalent to specifying one construct +immediately nested inside another. This property is used to simplify the dialect +by representing them through the operations associated to each leaf construct. +For example, `target teams` would be represented as follows: + +```mlir +omp.target ... { + ... + omp.teams ... { +... +omp.terminator + } + ... + omp.terminator +} +``` + +### Composite Constructs + +Composite constructs are similar to combined constructs in that they specify the +effect of one construct being applied immediately after another. However, they +group together constructs that cannot be directly nested into each other. +Specifically, they group together multiple loop-associated constructs that apply +to the same collapsed loop nest. + +As of version 5.2 of the OpenMP specification, the list of composite constructs +is the following: + - `{do,for} simd`; + - `distribute simd`; + - `distribute parallel {do,for}`; + - `distribute parallel {do,for} simd`; and + - `taskloop simd`. + +Even though the list of composite constructs is relatively short and it would +also be possible to create dialect operations for each, it was decided to +allow attaching multiple loop wrappers to a single loop instead. This minimizes +redundancy in the dialect and maximizes its modularity, since there is a single +operation for each leaf construct regardless of whether it can be part of a +composite construct. On the other hand, this means the `omp.loop_nest` operation +will have to be interpreted differently depending on how many and which loop +wrappers are attached to it. + +To simplify the detection of operations taking part in the representation of a +composite construct, the `ComposableOpInterface` was introduced. Its purpose is +to handle the `omp.composite` discardable dialect attribute that can optionally +be attached to these operations. Operation verifiers will ensure its presence is +consistent with the context the operation appears in, so that it is valid when +the attribute is present if and only if it represents a leaf of a composite +construct. + +For example, the `distribute simd` composite construct is represented as +follows: + +```mlir +omp.distribute ... { + omp.simd ... { +omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) { + ... + omp.yield +} +omp.terminator + } {omp.composite} + omp.terminator +} {omp.composite} +``` + +One exception to this is the representation of the +`distribute parallel {do,for}` composite construct. The presence of a +block-associated `parallel` leaf construct would introduce many problems if it +was allowed to work as a loop wrapper. In this case, the "hoisted `omp.parallel` +representation" is used instead. This consists in mak
[llvm-branch-commits] [mlir] [MLIR][OpenMP][Docs] NFC: Document compound constructs representation (PR #107236)
@@ -287,3 +287,117 @@ been implemented, but it is closely linked to the `omp.canonical_loop` work. Nevertheless, loop transformation that the `collapse` clause for loop-associated worksharing constructs defines can be represented by introducing multiple bounds, step and induction variables to the `omp.loop_nest` operation. + +## Compound Construct Representation + +The OpenMP specification defines certain shortcuts that allow specifying +multiple constructs in a single directive, which are referred to as compound +constructs (e.g. `parallel do` contains the `parallel` and `do` constructs). +These can be further classified into [combined](#combined-constructs) and +[composite](#composite-constructs) constructs. This section describes how they +are represented in the dialect. + +When clauses are specified for compound constructs, the OpenMP specification +defines a set of rules to decide to which leaf constructs they apply, as well as +potentially introducing some other implicit clauses. These rules must be taken +into account by those creating the MLIR representation, since it is a per-leaf +representation that expects these rules to have already been followed. + +### Combined Constructs + +Combined constructs are semantically equivalent to specifying one construct +immediately nested inside another. This property is used to simplify the dialect +by representing them through the operations associated to each leaf construct. +For example, `target teams` would be represented as follows: + +```mlir +omp.target ... { + ... + omp.teams ... { +... +omp.terminator + } + ... + omp.terminator +} +``` + +### Composite Constructs + +Composite constructs are similar to combined constructs in that they specify the +effect of one construct being applied immediately after another. However, they +group together constructs that cannot be directly nested into each other. +Specifically, they group together multiple loop-associated constructs that apply +to the same collapsed loop nest. + +As of version 5.2 of the OpenMP specification, the list of composite constructs +is the following: + - `{do,for} simd`; + - `distribute simd`; + - `distribute parallel {do,for}`; + - `distribute parallel {do,for} simd`; and + - `taskloop simd`. + +Even though the list of composite constructs is relatively short and it would +also be possible to create dialect operations for each, it was decided to +allow attaching multiple loop wrappers to a single loop instead. This minimizes +redundancy in the dialect and maximizes its modularity, since there is a single +operation for each leaf construct regardless of whether it can be part of a +composite construct. On the other hand, this means the `omp.loop_nest` operation +will have to be interpreted differently depending on how many and which loop +wrappers are attached to it. + +To simplify the detection of operations taking part in the representation of a +composite construct, the `ComposableOpInterface` was introduced. Its purpose is +to handle the `omp.composite` discardable dialect attribute that can optionally +be attached to these operations. Operation verifiers will ensure its presence is +consistent with the context the operation appears in, so that it is valid when +the attribute is present if and only if it represents a leaf of a composite +construct. + +For example, the `distribute simd` composite construct is represented as +follows: + +```mlir +omp.distribute ... { + omp.simd ... { +omp.loop_nest (%i) : index = (%lb) to (%ub) step (%step) { + ... + omp.yield +} +omp.terminator + } {omp.composite} + omp.terminator +} {omp.composite} +``` + +One exception to this is the representation of the +`distribute parallel {do,for}` composite construct. The presence of a +block-associated `parallel` leaf construct would introduce many problems if it +was allowed to work as a loop wrapper. In this case, the "hoisted `omp.parallel` +representation" is used instead. This consists in making `omp.parallel` the +parent operation, with a nested `omp.loop_nest` wrapped by `omp.distribute` and +`omp.wsloop` (and `omp.simd`, in the `distribute parallel {do,for} simd` case). + +This approach works because `parallel` is a parallelism-generating construct, +whereas `distribute` is a worksharing construct impacting the higher level +`teams`, making the ordering between these constructs not cause semantic skatrak wrote: Thank you for noticing, done. https://github.com/llvm/llvm-project/pull/107236 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)
https://github.com/ritter-x2a updated https://github.com/llvm/llvm-project/pull/106965 >From a647e4446cbcc1018c1298ee411c80a855dc4ad9 Mon Sep 17 00:00:00 2001 From: Fabian Ritter Date: Mon, 2 Sep 2024 05:37:33 -0400 Subject: [PATCH] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments When a pointer to thread-local storage is passed in a function call, ISel first lowers the call and wraps the resulting code in CALLSEQ markers. Afterwards, to compute the pointer to TLS, a call to retrieve the TLS base address is generated and then wrapped in a set of CALLSEQ markers. If the latter call is inserted into the call sequence of the former call, this leads to nested call frames, which are illegal and lead to errors in the machine verifier. This patch avoids surrounding the call to compute the TLS base address in CALLSEQ markers if it is already surrounded by such markers. It relies on zero-sized call frames being represented in the call frame size info stored in the MachineBBs. Fixes #45574 and #98042. --- llvm/lib/Target/X86/X86ISelLowering.cpp | 7 + .../test/CodeGen/X86/tls-function-argument.ll | 30 +++ 2 files changed, 37 insertions(+) create mode 100644 llvm/test/CodeGen/X86/tls-function-argument.ll diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index e5ba54c176f07b..6bd9efa4950828 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35603,6 +35603,13 @@ X86TargetLowering::EmitLoweredTLSAddr(MachineInstr &MI, // inside MC, therefore without the two markers shrink-wrapping // may push the prologue/epilogue pass them. const TargetInstrInfo &TII = *Subtarget.getInstrInfo(); + + // Do not introduce CALLSEQ markers if we are already in a call sequence. + // Nested call sequences are not allowed and cause errors in the machine + // verifier. + if (TII.getCallFrameSizeAt(MI).has_value()) +return BB; + const MIMetadata MIMD(MI); MachineFunction &MF = *BB->getParent(); diff --git a/llvm/test/CodeGen/X86/tls-function-argument.ll b/llvm/test/CodeGen/X86/tls-function-argument.ll new file mode 100644 index 00..9b6ab529db3ea3 --- /dev/null +++ b/llvm/test/CodeGen/X86/tls-function-argument.ll @@ -0,0 +1,30 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=x86_64 -verify-machineinstrs -relocation-model=pic < %s | FileCheck %s + +; Passing a pointer to thread-local storage to a function can be problematic +; since computing such addresses requires a function call that is introduced +; very late in instruction selection. We need to ensure that we don't introduce +; nested call sequence markers if this function call happens in a call sequence. + +@TLS = internal thread_local global i64 zeroinitializer, align 8 +declare void @bar(ptr) +define internal void @foo() { +; CHECK-LABEL: foo: +; CHECK: # %bb.0: +; CHECK-NEXT:pushq %rbx +; CHECK-NEXT:.cfi_def_cfa_offset 16 +; CHECK-NEXT:.cfi_offset %rbx, -16 +; CHECK-NEXT:leaq TLS@TLSLD(%rip), %rdi +; CHECK-NEXT:callq __tls_get_addr@PLT +; CHECK-NEXT:leaq TLS@DTPOFF(%rax), %rbx +; CHECK-NEXT:movq %rbx, %rdi +; CHECK-NEXT:callq bar@PLT +; CHECK-NEXT:movq %rbx, %rdi +; CHECK-NEXT:callq bar@PLT +; CHECK-NEXT:popq %rbx +; CHECK-NEXT:.cfi_def_cfa_offset 8 +; CHECK-NEXT:retq + call void @bar(ptr @TLS) + call void @bar(ptr @TLS) + ret void +} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [clang] [Clang] Workaround dependent source location issues (#106925) (PR #107212)
https://github.com/AaronBallman approved this pull request. LGTM https://github.com/llvm/llvm-project/pull/107212 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [X86] Avoid generating nested CALLSEQ for TLS pointer function arguments (PR #106965)
jayfoad wrote: > > This sounds sketchy to me. Is it really valid to enter a second call inside > > another call's CALLSEQ markers, but only if we avoid adding a second nested > > set of markers? It feels like attacking the symptom of the issue, but not > > the root cause. (I'm not certain it's _not_ valid, but it just seems really > > suspicious...) > > From what I've gathered from the source comments and the > [patch](https://github.com/llvm/llvm-project/commit/228978c0dcfc9a9793f3dc8a69f42471192223bc) > introducing the code that inserts these CALLSEQ markers for TLSADDRs, their > only point here is to stop shrink-wrapping from moving the function > prologue/epilogue past the call to get the TLS address. This should also be > given when the TLSADDR is in another CALLSEQ. > > I am however by no means an expert on this topic; I'd appreciate more > insights on which uses of CALLSEQ markers are and are not valid (besides the > MachineVerifier checks). I also wondered about this. Are there other mechanisms that block shrink wrapping from moving the prologue? E.g. what if a regular instruction (not a call) has to come after the prologue, how would that be marked? Maybe adding an implicit use or def of some particular physical register would be enough?? https://github.com/llvm/llvm-project/pull/106965 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Add pseudo probe inline tree to YAML profile (PR #107137)
WenleiHe wrote: Didn't realize yaml profile currently doesn't have probe inline tree encoded. This can increase profile size a bit, just making sure that's not a concern for yaml profile. https://github.com/llvm/llvm-project/pull/107137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: Sander de Smalen (sdesmalen-arm) Changes The functionality to make use of SVE's load/store pair instructions for the callee-saves is broken because the offsets used in the instructions are incorrect. This is addressed by #105518 but given the complexity of this code and the subtleties around calculating the right offsets, we favour disabling the behaviour altogether for LLVM 19. This fix is critical for any programs being compiled with `+sme2`. --- Patch is 304.49 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107406.diff 5 Files Affected: - (modified) llvm/lib/Target/AArch64/AArch64FrameLowering.cpp (-33) - (modified) llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll (+66-38) - (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-ld1.ll (+944-544) - (modified) llvm/test/CodeGen/AArch64/sme2-intrinsics-ldnt1.ll (+944-544) - (modified) llvm/test/CodeGen/AArch64/sve-callee-save-restore-pairs.ll (+82-58) ``diff diff --git a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp index ba46ededc63a83..87e057a468afd6 100644 --- a/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp +++ b/llvm/lib/Target/AArch64/AArch64FrameLowering.cpp @@ -2931,16 +2931,6 @@ struct RegPairInfo { } // end anonymous namespace -unsigned findFreePredicateReg(BitVector &SavedRegs) { - for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) { -if (SavedRegs.test(PReg)) { - unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0; - return PNReg; -} - } - return AArch64::NoRegister; -} - static void computeCalleeSaveRegisterPairs( MachineFunction &MF, ArrayRef CSI, const TargetRegisterInfo *TRI, SmallVectorImpl &RegPairs, @@ -3645,7 +3635,6 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF, unsigned ExtraCSSpill = 0; bool HasUnpairedGPR64 = false; - bool HasPairZReg = false; // Figure out which callee-saved registers to save/restore. for (unsigned i = 0; CSRegs[i]; ++i) { const unsigned Reg = CSRegs[i]; @@ -3699,28 +3688,6 @@ void AArch64FrameLowering::determineCalleeSaves(MachineFunction &MF, !RegInfo->isReservedReg(MF, PairedReg)) ExtraCSSpill = PairedReg; } -// Check if there is a pair of ZRegs, so it can select PReg for spill/fill -HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) && -SavedRegs.test(CSRegs[i ^ 1])); - } - - if (HasPairZReg && (Subtarget.hasSVE2p1() || Subtarget.hasSME2())) { -AArch64FunctionInfo *AFI = MF.getInfo(); -// Find a suitable predicate register for the multi-vector spill/fill -// instructions. -unsigned PnReg = findFreePredicateReg(SavedRegs); -if (PnReg != AArch64::NoRegister) - AFI->setPredicateRegForFillSpill(PnReg); -// If no free callee-save has been found assign one. -if (!AFI->getPredicateRegForFillSpill() && -MF.getFunction().getCallingConv() == -CallingConv::AArch64_SVE_VectorCall) { - SavedRegs.set(AArch64::P8); - AFI->setPredicateRegForFillSpill(AArch64::PN8); -} - -assert(!RegInfo->isReservedReg(MF, AFI->getPredicateRegForFillSpill()) && - "Predicate cannot be a reserved register"); } if (MF.getFunction().getCallingConv() == CallingConv::Win64 && diff --git a/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll b/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll index 6264ce0cf4ae6d..fa8f92cb0a2c99 100644 --- a/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll +++ b/llvm/test/CodeGen/AArch64/sme-vg-to-stack.ll @@ -329,27 +329,34 @@ define void @vg_unwind_with_sve_args( %x) #0 { ; CHECK-NEXT:.cfi_offset w29, -32 ; CHECK-NEXT:addvl sp, sp, #-18 ; CHECK-NEXT:.cfi_escape 0x0f, 0x0d, 0x8f, 0x00, 0x11, 0x20, 0x22, 0x11, 0x90, 0x01, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 32 + 144 * VG -; CHECK-NEXT:str p8, [sp, #11, mul vl] // 2-byte Folded Spill -; CHECK-NEXT:ptrue pn8.b ; CHECK-NEXT:str p15, [sp, #4, mul vl] // 2-byte Folded Spill -; CHECK-NEXT:st1b { z22.b, z23.b }, pn8, [sp, #4, mul vl] // 32-byte Folded Spill -; CHECK-NEXT:st1b { z20.b, z21.b }, pn8, [sp, #8, mul vl] // 32-byte Folded Spill ; CHECK-NEXT:str p14, [sp, #5, mul vl] // 2-byte Folded Spill -; CHECK-NEXT:st1b { z18.b, z19.b }, pn8, [sp, #12, mul vl] // 32-byte Folded Spill -; CHECK-NEXT:st1b { z16.b, z17.b }, pn8, [sp, #16, mul vl] // 32-byte Folded Spill ; CHECK-NEXT:str p13, [sp, #6, mul vl] // 2-byte Folded Spill -; CHECK-NEXT:st1b { z14.b, z15.b }, pn8, [sp, #20, mul vl] // 32-byte Folded Spill -; CHECK-NEXT:st1b { z12.b, z13.b }, pn8, [sp, #24, mul vl] // 32-byte Folded Spill ; CHECK-NEXT:str p12, [sp, #7, mul vl] // 2-byte Folded Spill -; CHECK-NEXT:st1b { z10.b, z11.b }, pn8, [sp, #28, mul vl] // 32-byte Folded Spill ; CHECK-NEXT:str p11, [sp, #8, mul
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/sdesmalen-arm milestoned https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/sdesmalen-arm edited https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/paulwalker-arm approved this pull request. https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] DAG: Lower fcNormal is.fpclass to compare with inf (PR #100389)
https://github.com/arsenm updated https://github.com/llvm/llvm-project/pull/100389 >From d51a155348284c6fe453190d87d36f3f63e1f456 Mon Sep 17 00:00:00 2001 From: Matt Arsenault Date: Wed, 1 Feb 2023 09:06:59 -0400 Subject: [PATCH] DAG: Lower fcNormal is.fpclass to compare with inf Looks worse for x86 without the fabs check. Not sure if this is useful for any targets. --- .../CodeGen/SelectionDAG/TargetLowering.cpp | 25 +++ 1 file changed, 25 insertions(+) diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index 2b41b8a9a810e5..bd849c996f7ac0 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -8790,6 +8790,31 @@ SDValue TargetLowering::expandIS_FPCLASS(EVT ResultVT, SDValue Op, IsOrdered ? OrderedOp : UnorderedOp); } } + +if (FPTestMask == fcNormal) { + // TODO: Handle unordered + ISD::CondCode IsFiniteOp = IsInvertedFP ? ISD::SETUGE : ISD::SETOLT; + ISD::CondCode IsNormalOp = IsInvertedFP ? ISD::SETOLT : ISD::SETUGE; + + if (isCondCodeLegalOrCustom(IsFiniteOp, + OperandVT.getScalarType().getSimpleVT()) && + isCondCodeLegalOrCustom(IsNormalOp, + OperandVT.getScalarType().getSimpleVT()) && + isFAbsFree(OperandVT)) { +// isnormal(x) --> fabs(x) < infinity && !(fabs(x) < smallest_normal) +SDValue Inf = +DAG.getConstantFP(APFloat::getInf(Semantics), DL, OperandVT); +SDValue SmallestNormal = DAG.getConstantFP( +APFloat::getSmallestNormalized(Semantics), DL, OperandVT); + +SDValue Abs = DAG.getNode(ISD::FABS, DL, OperandVT, Op); +SDValue IsFinite = DAG.getSetCC(DL, ResultVT, Abs, Inf, IsFiniteOp); +SDValue IsNormal = +DAG.getSetCC(DL, ResultVT, Abs, SmallestNormal, IsNormalOp); +unsigned LogicOp = IsInvertedFP ? ISD::OR : ISD::AND; +return DAG.getNode(LogicOp, DL, ResultVT, IsFinite, IsNormal); + } +} } // Some checks may be represented as inversion of simpler check, for example ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [AArch64] Disable SVE paired ld1/st1 for callee-saves. (PR #107406)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/107406 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/107435 Backport 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c Requested by: @sdesmalen-arm >From a943f987d44647f38ae4c5c2d1f69b8f666e16ab Mon Sep 17 00:00:00 2001 From: Sander de Smalen Date: Thu, 5 Sep 2024 17:54:57 +0100 Subject: [PATCH] [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) This removes a redundant 'COPY' instruction that #81716 probably forgot to remove. This redundant COPY led to an issue because because code in LiveRangeSplitting expects that the instruction emitted by `loadRegFromStackSlot` is an instruction that accesses memory, which isn't the case for the COPY instruction. (cherry picked from commit 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c) --- llvm/lib/Target/AArch64/AArch64InstrInfo.cpp | 4 --- llvm/test/CodeGen/AArch64/spillfill-sve.mir | 37 +++- 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp index 377bcd5868fb64..805684ef69a592 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp @@ -5144,10 +5144,6 @@ void AArch64InstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB, if (PNRReg.isValid() && !PNRReg.isVirtual()) MI.addDef(PNRReg, RegState::Implicit); MI.addMemOperand(MMO); - - if (PNRReg.isValid() && PNRReg.isVirtual()) -BuildMI(MBB, MBBI, DebugLoc(), get(TargetOpcode::COPY), PNRReg) -.addReg(DestReg); } bool llvm::isNZCVTouchedInInstructionRange(const MachineInstr &DefMI, diff --git a/llvm/test/CodeGen/AArch64/spillfill-sve.mir b/llvm/test/CodeGen/AArch64/spillfill-sve.mir index 11cf388e385312..83c9b73c575708 100644 --- a/llvm/test/CodeGen/AArch64/spillfill-sve.mir +++ b/llvm/test/CodeGen/AArch64/spillfill-sve.mir @@ -11,6 +11,7 @@ define aarch64_sve_vector_pcs void @spills_fills_stack_id_ppr2mul2() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_pnr() #1 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_pnr() #1 { entry: unreachable } + define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_ppr_to_pnr() #1 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2strided() #0 { entry: unreachable } @@ -216,7 +217,7 @@ body: | ; EXPAND: STR_PXI killed renamable $pn8, $sp, 7 ; ; EXPAND: renamable $pn8 = LDR_PXI $sp, 7 -; EXPAND: $p0 = PEXT_PCI_B killed renamable $pn8, 0 +; EXPAND-NEXT: $p0 = PEXT_PCI_B killed renamable $pn8, 0 %0:pnr_p8to15 = WHILEGE_CXX_B undef $x0, undef $x0, 0, implicit-def dead $nzcv @@ -242,6 +243,40 @@ body: | RET_ReallyLR ... --- +name: spills_fills_stack_id_virtreg_ppr_to_pnr +tracksRegLiveness: true +registers: + - { id: 0, class: ppr } + - { id: 1, class: pnr_p8to15 } +stack: +body: | + bb.0.entry: +liveins: $p0 + +%0:ppr = COPY $p0 + +$pn0 = IMPLICIT_DEF +$pn1 = IMPLICIT_DEF +$pn2 = IMPLICIT_DEF +$pn3 = IMPLICIT_DEF +$pn4 = IMPLICIT_DEF +$pn5 = IMPLICIT_DEF +$pn6 = IMPLICIT_DEF +$pn7 = IMPLICIT_DEF +$pn8 = IMPLICIT_DEF +$pn9 = IMPLICIT_DEF +$pn10 = IMPLICIT_DEF +$pn11 = IMPLICIT_DEF +$pn12 = IMPLICIT_DEF +$pn13 = IMPLICIT_DEF +$pn14 = IMPLICIT_DEF +$pn15 = IMPLICIT_DEF + +%1:pnr_p8to15 = COPY %0 +$p0 = PEXT_PCI_B %1, 0 +RET_ReallyLR +... +--- name: spills_fills_stack_id_zpr tracksRegLiveness: true registers: ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
llvmbot wrote: @llvm/pr-subscribers-backend-aarch64 Author: None (llvmbot) Changes Backport 91a3c6f3d66b866bcda8a0f7d4815bc8f2dbd86c Requested by: @sdesmalen-arm --- Full diff: https://github.com/llvm/llvm-project/pull/107435.diff 2 Files Affected: - (modified) llvm/lib/Target/AArch64/AArch64InstrInfo.cpp (-4) - (modified) llvm/test/CodeGen/AArch64/spillfill-sve.mir (+36-1) ``diff diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp index 377bcd5868fb64..805684ef69a592 100644 --- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp +++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp @@ -5144,10 +5144,6 @@ void AArch64InstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB, if (PNRReg.isValid() && !PNRReg.isVirtual()) MI.addDef(PNRReg, RegState::Implicit); MI.addMemOperand(MMO); - - if (PNRReg.isValid() && PNRReg.isVirtual()) -BuildMI(MBB, MBBI, DebugLoc(), get(TargetOpcode::COPY), PNRReg) -.addReg(DestReg); } bool llvm::isNZCVTouchedInInstructionRange(const MachineInstr &DefMI, diff --git a/llvm/test/CodeGen/AArch64/spillfill-sve.mir b/llvm/test/CodeGen/AArch64/spillfill-sve.mir index 11cf388e385312..83c9b73c575708 100644 --- a/llvm/test/CodeGen/AArch64/spillfill-sve.mir +++ b/llvm/test/CodeGen/AArch64/spillfill-sve.mir @@ -11,6 +11,7 @@ define aarch64_sve_vector_pcs void @spills_fills_stack_id_ppr2mul2() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_pnr() #1 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_pnr() #1 { entry: unreachable } + define aarch64_sve_vector_pcs void @spills_fills_stack_id_virtreg_ppr_to_pnr() #1 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2() #0 { entry: unreachable } define aarch64_sve_vector_pcs void @spills_fills_stack_id_zpr2strided() #0 { entry: unreachable } @@ -216,7 +217,7 @@ body: | ; EXPAND: STR_PXI killed renamable $pn8, $sp, 7 ; ; EXPAND: renamable $pn8 = LDR_PXI $sp, 7 -; EXPAND: $p0 = PEXT_PCI_B killed renamable $pn8, 0 +; EXPAND-NEXT: $p0 = PEXT_PCI_B killed renamable $pn8, 0 %0:pnr_p8to15 = WHILEGE_CXX_B undef $x0, undef $x0, 0, implicit-def dead $nzcv @@ -242,6 +243,40 @@ body: | RET_ReallyLR ... --- +name: spills_fills_stack_id_virtreg_ppr_to_pnr +tracksRegLiveness: true +registers: + - { id: 0, class: ppr } + - { id: 1, class: pnr_p8to15 } +stack: +body: | + bb.0.entry: +liveins: $p0 + +%0:ppr = COPY $p0 + +$pn0 = IMPLICIT_DEF +$pn1 = IMPLICIT_DEF +$pn2 = IMPLICIT_DEF +$pn3 = IMPLICIT_DEF +$pn4 = IMPLICIT_DEF +$pn5 = IMPLICIT_DEF +$pn6 = IMPLICIT_DEF +$pn7 = IMPLICIT_DEF +$pn8 = IMPLICIT_DEF +$pn9 = IMPLICIT_DEF +$pn10 = IMPLICIT_DEF +$pn11 = IMPLICIT_DEF +$pn12 = IMPLICIT_DEF +$pn13 = IMPLICIT_DEF +$pn14 = IMPLICIT_DEF +$pn15 = IMPLICIT_DEF + +%1:pnr_p8to15 = COPY %0 +$p0 = PEXT_PCI_B %1, 0 +RET_ReallyLR +... +--- name: spills_fills_stack_id_zpr tracksRegLiveness: true registers: `` https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [BOLT] Add pseudo probe inline tree to YAML profile (PR #107137)
aaupov wrote: > Didn't realize yaml profile currently doesn't have probe inline tree encoded. > This can increase profile size a bit, just making sure that's not a concern > for yaml profile. Good call. Including probe inline tree does increase the size of yaml profile by about 2x pre- and post-compression (221M -> 404M, 20M -> 41M resp.) for a large binary, which also causes a 2x profile reading time increase. Let me experiment with reorganizing the data to reduce the size. https://github.com/llvm/llvm-project/pull/107137 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
https://github.com/mtrofin edited https://github.com/llvm/llvm-project/pull/107329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [clang] Backport "[Clang][CodeGen] Fix type for atomic float incdec operators (#107075)" (PR #107184)
https://github.com/RKSimon approved this pull request. https://github.com/llvm/llvm-project/pull/107184 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
https://github.com/aemerson approved this pull request. https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [AArch64] Remove redundant COPY from loadRegFromStackSlot (#107396) (PR #107435)
aemerson wrote: To justify this for the 19 release: this is easily triggered by small IR so we should take this. https://github.com/llvm/llvm-project/pull/107435 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
https://github.com/mtrofin updated https://github.com/llvm/llvm-project/pull/107329 >From 856568c07d924dd59aaa81450cb8bcb64d60d2eb Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Tue, 3 Sep 2024 21:28:05 -0700 Subject: [PATCH] [ctx_prof] Flattened profile lowering pass --- llvm/include/llvm/ProfileData/ProfileCommon.h | 6 +- .../Instrumentation/PGOCtxProfFlattening.h| 25 ++ llvm/lib/Passes/PassBuilder.cpp | 1 + llvm/lib/Passes/PassBuilderPipelines.cpp | 1 + llvm/lib/Passes/PassRegistry.def | 1 + .../Transforms/Instrumentation/CMakeLists.txt | 1 + .../Instrumentation/PGOCtxProfFlattening.cpp | 341 ++ .../flatten-always-removes-instrumentation.ll | 12 + .../CtxProfAnalysis/flatten-and-annotate.ll | 112 ++ 9 files changed, 497 insertions(+), 3 deletions(-) create mode 100644 llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h create mode 100644 llvm/lib/Transforms/Instrumentation/PGOCtxProfFlattening.cpp create mode 100644 llvm/test/Analysis/CtxProfAnalysis/flatten-always-removes-instrumentation.ll create mode 100644 llvm/test/Analysis/CtxProfAnalysis/flatten-and-annotate.ll diff --git a/llvm/include/llvm/ProfileData/ProfileCommon.h b/llvm/include/llvm/ProfileData/ProfileCommon.h index eaab59484c947a..edd8e1f644ad12 100644 --- a/llvm/include/llvm/ProfileData/ProfileCommon.h +++ b/llvm/include/llvm/ProfileData/ProfileCommon.h @@ -79,13 +79,13 @@ class ProfileSummaryBuilder { class InstrProfSummaryBuilder final : public ProfileSummaryBuilder { uint64_t MaxInternalBlockCount = 0; - inline void addEntryCount(uint64_t Count); - inline void addInternalCount(uint64_t Count); - public: InstrProfSummaryBuilder(std::vector Cutoffs) : ProfileSummaryBuilder(std::move(Cutoffs)) {} + void addEntryCount(uint64_t Count); + void addInternalCount(uint64_t Count); + void addRecord(const InstrProfRecord &); std::unique_ptr getSummary(); }; diff --git a/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h new file mode 100644 index 00..0eab3aaf6fcad3 --- /dev/null +++ b/llvm/include/llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h @@ -0,0 +1,25 @@ +//===-- PGOCtxProfFlattening.h - Contextual Instr. Flattening ---*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// This file declares the PGOCtxProfFlattening class. +// +//===--===// +#ifndef LLVM_TRANSFORMS_INSTRUMENTATION_PGOCTXPROFFLATTENING_H +#define LLVM_TRANSFORMS_INSTRUMENTATION_PGOCTXPROFFLATTENING_H + +#include "llvm/IR/PassManager.h" +namespace llvm { + +class PGOCtxProfFlatteningPass +: public PassInfoMixin { +public: + explicit PGOCtxProfFlatteningPass() = default; + PreservedAnalyses run(Module &M, ModuleAnalysisManager &MAM); +}; +} // namespace llvm +#endif diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp index a22abed8051a11..d87e64eff08966 100644 --- a/llvm/lib/Passes/PassBuilder.cpp +++ b/llvm/lib/Passes/PassBuilder.cpp @@ -198,6 +198,7 @@ #include "llvm/Transforms/Instrumentation/MemProfiler.h" #include "llvm/Transforms/Instrumentation/MemorySanitizer.h" #include "llvm/Transforms/Instrumentation/NumericalStabilitySanitizer.h" +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h" #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h" #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 1fd7ef929c87d5..38297dc02b8be6 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -76,6 +76,7 @@ #include "llvm/Transforms/Instrumentation/InstrOrderFile.h" #include "llvm/Transforms/Instrumentation/InstrProfiling.h" #include "llvm/Transforms/Instrumentation/MemProfiler.h" +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" #include "llvm/Transforms/Instrumentation/PGOCtxProfLowering.h" #include "llvm/Transforms/Instrumentation/PGOForceFunctionAttrs.h" #include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def index d6067089c6b5c1..2b0624cb9874da 100644 --- a/llvm/lib/Passes/PassRegistry.def +++ b/llvm/lib/Passes/PassRegistry.def @@ -58,6 +58,7 @@ MODULE_PASS("coro-early", CoroEarlyPass()) MODULE_PASS("cross-dso-cfi", CrossDSOCFIPass()) MODULE_PASS("ctx-instr-gen", PGOInstrumentationGen(PGOInstrum
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,341 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/ScopeExit.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) david-xl wrote: Why can E be null? https://github.com/llvm/llvm-project/pull/107329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,341 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/ScopeExit.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { david-xl wrote: nit: take-> compute https://github.com/llvm/llvm-project/pull/107329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,341 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/ScopeExit.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, david-xl wrote: brief document when to assumeAllKnown. https://github.com/llvm/llvm-project/pull/107329 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] [ctx_prof] Flattened profile lowering pass (PR #107329)
@@ -0,0 +1,333 @@ +//===- PGOCtxProfFlattening.cpp - Contextual Instr. Flattening ===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===--===// +// +// Flattens the contextual profile and lowers it to MD_prof. +// This should happen after all IPO (which is assumed to have maintained the +// contextual profile) happened. Flattening consists of summing the values at +// the same index of the counters belonging to all the contexts of a function. +// The lowering consists of materializing the counter values to function +// entrypoint counts and branch probabilities. +// +// This pass also removes contextual instrumentation, which has been kept around +// to facilitate its functionality. +// +//===--===// + +#include "llvm/Transforms/Instrumentation/PGOCtxProfFlattening.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/Analysis/CtxProfAnalysis.h" +#include "llvm/Analysis/OptimizationRemarkEmitter.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" +#include "llvm/CodeGen/MachineBasicBlock.h" +#include "llvm/IR/Analysis.h" +#include "llvm/IR/CFG.h" +#include "llvm/IR/Dominators.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Module.h" +#include "llvm/IR/PassManager.h" +#include "llvm/IR/ProfileSummary.h" +#include "llvm/ProfileData/ProfileCommon.h" +#include "llvm/Transforms/Instrumentation/PGOInstrumentation.h" +#include "llvm/Transforms/Scalar/DCE.h" +#include "llvm/Transforms/Utils/BasicBlockUtils.h" + +using namespace llvm; + +namespace { + +class ProfileAnnotator final { + class BBInfo; + struct EdgeInfo { +BBInfo *const Src; +BBInfo *const Dest; +std::optional Count; + +explicit EdgeInfo(BBInfo &Src, BBInfo &Dest) : Src(&Src), Dest(&Dest) {} + }; + + class BBInfo { +std::optional Count; +SmallVector OutEdges; +SmallVector InEdges; +size_t UnknownCountOutEdges = 0; +size_t UnknownCountInEdges = 0; + +uint64_t getEdgeSum(const SmallVector &Edges, +bool AssumeAllKnown) const { + uint64_t Sum = 0; + for (const auto *E : Edges) +if (E) + Sum += AssumeAllKnown ? *E->Count : E->Count.value_or(0U); + return Sum; +} + +void takeCountFrom(const SmallVector &Edges) { + assert(!Count.has_value()); + Count = getEdgeSum(Edges, true); +} + +void setSingleUnknownEdgeCount(SmallVector &Edges) { + uint64_t KnownSum = getEdgeSum(Edges, false); + uint64_t EdgeVal = *Count > KnownSum ? *Count - KnownSum : 0U; + EdgeInfo *E = nullptr; + for (auto *I : Edges) +if (I && !I->Count.has_value()) { + E = I; +#ifdef NDEBUG + break; +#else + assert((!E || E == I) && + "Expected exactly one edge to have an unknown count, " + "found a second one"); + continue; +#endif +} + assert(E && "Expected exactly one edge to have an unknown count"); + assert(!E->Count.has_value()); + E->Count = EdgeVal; + assert(E->Src->UnknownCountOutEdges > 0); + assert(E->Dest->UnknownCountInEdges > 0); + --E->Src->UnknownCountOutEdges; + --E->Dest->UnknownCountInEdges; +} + + public: +BBInfo(size_t NumInEdges, size_t NumOutEdges, std::optional Count) +: Count(Count) { + InEdges.reserve(NumInEdges); + OutEdges.resize(NumOutEdges); +} + +bool tryTakeCountFromKnownOutEdges(const BasicBlock &BB) { + if (!succ_empty(&BB) && !UnknownCountOutEdges) { +takeCountFrom(OutEdges); +return true; + } + return false; +} + +bool tryTakeCountFromKnownInEdges(const BasicBlock &BB) { + if (!BB.isEntryBlock() && !UnknownCountInEdges) { +takeCountFrom(InEdges); +return true; + } + return false; +} + +void addInEdge(EdgeInfo *Info) { + InEdges.push_back(Info); + ++UnknownCountInEdges; +} + +void addOutEdge(size_t Index, EdgeInfo *Info) { + OutEdges[Index] = Info; + ++UnknownCountOutEdges; +} + +bool hasCount() const { return Count.has_value(); } + +bool trySetSingleUnknownInEdgeCount() { + if (UnknownCountInEdges == 1) { +setSingleUnknownEdgeCount(InEdges); +return true; + } + return false; +} + +bool trySetSingleUnknownOutEdgeCount() { + if (UnknownCountOutEdges == 1) { +setSingleUnknownEdgeCount(OutEdges); +return true; + } + return false; +} +size_t getNumOutEdges() const { return OutEdges.size(); } + +uint64_t getEdgeCount(size_t Index) const { + if (auto *E = OutEdges[Index]) +return *E->Count; + return 0U; +} + }; + + F
[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)
https://github.com/llvmbot created https://github.com/llvm/llvm-project/pull/107466 Backport 2e0ded3371f8d42f376bdfd4d70687537e36818e Requested by: @jrtc27 >From 66968ee023596c5674d1e94c98e8d7d2a069a69c Mon Sep 17 00:00:00 2001 From: R-Goc <131907007+r-...@users.noreply.github.com> Date: Wed, 4 Sep 2024 20:10:36 +0200 Subject: [PATCH] [Windows SEH] Fix crash on empty seh block (#107031) Fixes https://github.com/llvm/llvm-project/issues/105813 and https://github.com/llvm/llvm-project/issues/106915. Adds a check for the end of the iterator, which can be a sentinel. The issue was introduced in https://github.com/llvm/llvm-project/commit/0efe111365ae176671e01252d24028047d807a84 from what I can see, so along with the introduction of /EHa support. (cherry picked from commit 2e0ded3371f8d42f376bdfd4d70687537e36818e) --- .../CodeGen/SelectionDAG/SelectionDAGISel.cpp | 4 .../CodeGen/WinEH/wineh-empty-seh-scope.ll | 18 ++ 2 files changed, 22 insertions(+) create mode 100644 llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp index df3d207d85d351..b961d3bb1fec7f 100644 --- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp @@ -1453,6 +1453,10 @@ void SelectionDAGISel::reportIPToStateForBlocks(MachineFunction *MF) { if (BB->getFirstMayFaultInst()) { // Report IP range only for blocks with Faulty inst auto MBBb = MBB.getFirstNonPHI(); + + if (MBBb == MBB.end()) +continue; + MachineInstr *MIb = &*MBBb; if (MIb->isTerminator()) continue; diff --git a/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll b/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll new file mode 100644 index 00..5f382f10f180bc --- /dev/null +++ b/llvm/test/CodeGen/WinEH/wineh-empty-seh-scope.ll @@ -0,0 +1,18 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=x86_64-pc-windows-msvc19.41.34120 < %s | FileCheck %s + +define void @foo() personality ptr @__CxxFrameHandler3 { +; CHECK-LABEL: foo: +; CHECK: # %bb.0: +; CHECK-NEXT:nop # avoids zero-length function + call void @llvm.seh.scope.begin() + unreachable +} + +declare i32 @__CxxFrameHandler3(...) + +declare void @llvm.seh.scope.begin() + +!llvm.module.flags = !{!0} + +!0 = !{i32 2, !"eh-asynch", i32 1} ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)
https://github.com/llvmbot milestoned https://github.com/llvm/llvm-project/pull/107466 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] release/19.x: [Windows SEH] Fix crash on empty seh block (#107031) (PR #107466)
llvmbot wrote: @arsenm What do you think about merging this PR to the release branch? https://github.com/llvm/llvm-project/pull/107466 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Insert the ctx prof flattener after the module inliner (PR #107499)
https://github.com/mtrofin created https://github.com/llvm/llvm-project/pull/107499 None >From e90265db97747c0b15f81b31f061e299ffd33138 Mon Sep 17 00:00:00 2001 From: Mircea Trofin Date: Thu, 5 Sep 2024 12:52:56 -0700 Subject: [PATCH] [ctx_prof] Insert the ctx prof flattener after the module inliner --- llvm/lib/Passes/PassBuilderPipelines.cpp | 18 +- llvm/test/Analysis/CtxProfAnalysis/inline.ll | 17 + llvm/test/Other/opt-hot-cold-split.ll| 2 +- 3 files changed, 31 insertions(+), 6 deletions(-) diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp index 38297dc02b8be6..f9b5f584e00c07 100644 --- a/llvm/lib/Passes/PassBuilderPipelines.cpp +++ b/llvm/lib/Passes/PassBuilderPipelines.cpp @@ -1017,6 +1017,11 @@ PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level, IP.EnableDeferral = false; MPM.addPass(ModuleInlinerPass(IP, UseInlineAdvisor, Phase)); + if (!UseCtxProfile.empty()) { +MPM.addPass(GlobalOptPass()); +MPM.addPass(GlobalDCEPass()); +MPM.addPass(PGOCtxProfFlatteningPass()); + } MPM.addPass(createModuleToFunctionPassAdaptor( buildFunctionSimplificationPipeline(Level, Phase), @@ -1744,11 +1749,14 @@ ModulePassManager PassBuilder::buildThinLTODefaultPipeline( MPM.addPass(GlobalDCEPass()); return MPM; } - - // Add the core simplification pipeline. - MPM.addPass(buildModuleSimplificationPipeline( - Level, ThinOrFullLTOPhase::ThinLTOPostLink)); - + if (!UseCtxProfile.empty()) { +MPM.addPass( +buildModuleInlinerPipeline(Level, ThinOrFullLTOPhase::ThinLTOPostLink)); + } else { +// Add the core simplification pipeline. +MPM.addPass(buildModuleSimplificationPipeline( +Level, ThinOrFullLTOPhase::ThinLTOPostLink)); + } // Now add the optimization pipeline. MPM.addPass(buildModuleOptimizationPipeline( Level, ThinOrFullLTOPhase::ThinLTOPostLink)); diff --git a/llvm/test/Analysis/CtxProfAnalysis/inline.ll b/llvm/test/Analysis/CtxProfAnalysis/inline.ll index 875bc4938653b9..9381418c4e3f12 100644 --- a/llvm/test/Analysis/CtxProfAnalysis/inline.ll +++ b/llvm/test/Analysis/CtxProfAnalysis/inline.ll @@ -31,6 +31,23 @@ ; CHECK-NEXT:%call2 = call i32 @a(i32 %x) #1 ; CHECK-NEXT:br label %exit +; Make sure the postlink thinlto pipeline is aware of ctxprof +; RUN: opt -passes='thinlto' -use-ctx-profile=%t/profile.ctxprofdata \ +; RUN: %t/module.ll -S -o - | FileCheck %s --check-prefix=PIPELINE + +; PIPELINE-LABEL: define i32 @entrypoint +; PIPELINE-SAME: !prof ![[ENTRYPOINT_COUNT:[0-9]+]] +; PIPELINE-LABEL: loop.i: +; PIPELINE: br i1 %cond.i, label %loop.i, label %exit, !prof ![[LOOP_BW_INL:[0-9]+]] +; PIPELINE-LABEL: define i32 @a +; PIPELINE-LABEL: loop: +; PIPELINE: br i1 %cond, label %loop, label %exit, !prof ![[LOOP_BW_ORIG:[0-9]+]] + +; PIPELINE: ![[ENTRYPOINT_COUNT]] = !{!"function_entry_count", i64 10} +; These are the weights of the inlined @a, where the counters were 2, 100 (2 for entry, 100 for loop) +; PIPELINE: ![[LOOP_BW_INL]] = !{!"branch_weights", i32 98, i32 2} +; These are the weights of the un-inlined @a, where the counters were 8, 500 (8 for entry, 500 for loop) +; PIPELINE: ![[LOOP_BW_ORIG]] = !{!"branch_weights", i32 492, i32 8} ;--- module.ll define i32 @entrypoint(i32 %x) !guid !0 { diff --git a/llvm/test/Other/opt-hot-cold-split.ll b/llvm/test/Other/opt-hot-cold-split.ll index 21c713d35bb746..cd290dcc306570 100644 --- a/llvm/test/Other/opt-hot-cold-split.ll +++ b/llvm/test/Other/opt-hot-cold-split.ll @@ -2,7 +2,7 @@ ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-PRELINK-Os ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto-pre-link' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-PRELINK-Os ; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='lto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=LTO-POSTLINK-Os -; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -passes='thinlto' -debug-pass-manager < %s -o /dev/null 2>&1 | FileCheck %s -check-prefix=THINLTO-POSTLINK-Os +; RUN: opt -mtriple=x86_64-- -hot-cold-split=true -LINK-Os ; REQUIRES: asserts ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits
[llvm-branch-commits] [llvm] [ctx_prof] Insert the ctx prof flattener after the module inliner (PR #107499)
mtrofin wrote: > [!WARNING] > This pull request is not mergeable via GitHub because a downstack PR is > open. Once all requirements are satisfied, merge this PR as a stack href="https://app.graphite.dev/github/pr/llvm/llvm-project/107499?utm_source=stack-comment-downstack-mergeability-warning"; > >on Graphite. > https://graphite.dev/docs/merge-pull-requests";>Learn more * **#107499** https://app.graphite.dev/github/pr/llvm/llvm-project/107499?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> 👈 * **#107329** https://app.graphite.dev/github/pr/llvm/llvm-project/107329?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * **#107463** https://app.graphite.dev/github/pr/llvm/llvm-project/107463?utm_source=stack-comment-icon"; target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="10px" height="10px"/> * `main` This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about stacking. Join @mtrofin and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" width="11px" height="11px"/> Graphite https://github.com/llvm/llvm-project/pull/107499 ___ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits