[llvm-branch-commits] [llvm] Update correct dependency (PR #109937)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

arsenm wrote:

> Description needs to be more specific

Bump 

https://github.com/llvm/llvm-project/pull/109937
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [SystemZ] Fix codegen for _[u]128 intrinsics (PR #111376)

2024-10-07 Thread Ulrich Weigand via llvm-branch-commits

https://github.com/uweigand approved this pull request.

I think this should be a candidate for backport, as the problem is a 
miscompilation regression and the fix is straightforward.

https://github.com/llvm/llvm-project/pull/111376
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add tests for SIPreAllocateWWMRegs (PR #109963)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits


@@ -0,0 +1,43 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-run-pass=si-pre-allocate-wwm-regs -o - -mcpu=tahiti %s  | FileCheck %s
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-amdgpu-prealloc-sgpr-spill-vgprs -run-pass=si-pre-allocate-wwm-regs -o - 
-mcpu=tahiti %s | FileCheck %s --check-prefix=CHECK2
+
+---
+
+name: pre_allocate_wwm_regs_strict
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK-LABEL: name: pre_allocate_wwm_regs_strict
+; CHECK: liveins: $sgpr1
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK-NEXT: renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def 
$exec, implicit-def $scc, implicit $exec
+; CHECK-NEXT: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+; CHECK-NEXT: dead $vgpr0 = V_MOV_B32_dpp $vgpr0, [[DEF]], 323, 12, 15, 0, 
implicit $exec
+; CHECK-NEXT: $exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def $exec, 
implicit-def $scc, implicit $exec
+%1:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+%2:vgpr_32 = V_MOV_B32_dpp %1, %0, 323, 12, 15, 0, implicit $exec
+$exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+%3:vgpr_32 = COPY %0
+...
+---
+
+name: pre_allocate_wwm_spill_to_vgpr
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK2-LABEL: name: pre_allocate_wwm_spill_to_vgpr
+; CHECK2: liveins: $sgpr1
+; CHECK2-NEXT: {{  $}}
+; CHECK2-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK2-NEXT: dead $vgpr0 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, [[DEF]]
+; CHECK2-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+%1:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, %0
+%2:vgpr_32 = COPY %0

arsenm wrote:

Missing ... at end of function 

https://github.com/llvm/llvm-project/pull/109963
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [MSVC] work-around for compile time issue 102513 (PR #111314)

2024-10-07 Thread via llvm-branch-commits

bd1976bris wrote:

> > @tru I'm unable to merge this even with approval. Perhaps this has been 
> > made against the wrong branch or 19.x now closed?
> 
> No it's correct. Only the release manager can merge to the release branch, I 
> will do that before the next release.

Thanks very much!

https://github.com/llvm/llvm-project/pull/111314
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (PR #109939)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/109939
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/110298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

Meinersbur wrote:

The formatting violations in 
https://github.com/llvm/llvm-project/pull/110298#issuecomment-2379683497 are 
those already present in the current trunk. `git clang-format` checks all 
formatting violations in a moved file, even if the content itself has not 
changed.

https://github.com/llvm/llvm-project/pull/110298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur ready_for_review 
https://github.com/llvm/llvm-project/pull/110298
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-semantics

Author: Michael Kruse (Meinersbur)


Changes

Mostly mechanical changes in preparation of extracting the FortranRuntime 
"subproject" in #110217. This PR intends to only move pre-existing 
files to the new folder structure, with no behavioral change.

`Common` and `Testing` are the directories shared by FortranRuntime and Flang. 
`Runtime` and `module` are going to be used by FortranRuntime only. Files in 
`Common` that are used only by Flang are moved into `Support`.

Some cosmetic changes and files paths were necessary:
 * Relative paths to the new path for the source files and `add_subdirectory`.
 * Add the new location's include directory to `include_directories`
 * The unittest/Evaluate directory has unitests for FortranRuntime and Flang. A 
new `CMakeLists.txt` was introduced for the FortranRuntime tests.
 * Change the of the `#include` paths relative to the include directive
 * clang-format on the `#include` directives
 * Since the paths is part if the copyright header and include guards, a script 
was used to canonicalize those
 * `test/Runtime` and runtime tests in `test/Driver` are moved, but the 
lit.cfg.py mechanism to execute the will only be added in #110217.

---

Patch is 334.25 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/110298.diff


418 Files Affected:

- (added) FortranRuntime/.clang-format (+21) 
- (renamed) FortranRuntime/cmake/config.h.cmake.in () 
- (renamed) FortranRuntime/include/flang/Common/Fortran-consts.h () 
- (renamed) FortranRuntime/include/flang/Common/ISO_Fortran_binding_wrapper.h 
(+7-8) 
- (renamed) FortranRuntime/include/flang/Common/api-attrs.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/binary-floating-point.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Common/bit-population-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/constexpr-bitset.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/decimal.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/enum-class.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/enum-set.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/fast-int-set.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/float128.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/format.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/idioms.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/leading-zero-bit-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/magic-numbers.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/optional.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/real.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/reference-wrapper.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/restorer.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/target-rounding.h () 
- (renamed) FortranRuntime/include/flang/Common/uint128.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/variant.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/visit.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/windows-include.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/allocator.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/descriptor.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocatable.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocator-registry.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Runtime/array-constructor.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Runtime/assign.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/c-or-cpp.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/character.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/command.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/cpp-type.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/derived-api.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/descriptor.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/entry-names.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Runtime/exceptions.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/execute.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/extensions.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/freestanding-tools.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/inquiry.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/io-api.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/iostat.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/main.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-instances.inc () 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-transpose.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/memory.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/m

[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-parser

Author: Michael Kruse (Meinersbur)


Changes

Mostly mechanical changes in preparation of extracting the FortranRuntime 
"subproject" in #110217. This PR intends to only move pre-existing 
files to the new folder structure, with no behavioral change.

`Common` and `Testing` are the directories shared by FortranRuntime and Flang. 
`Runtime` and `module` are going to be used by FortranRuntime only. Files in 
`Common` that are used only by Flang are moved into `Support`.

Some cosmetic changes and files paths were necessary:
 * Relative paths to the new path for the source files and `add_subdirectory`.
 * Add the new location's include directory to `include_directories`
 * The unittest/Evaluate directory has unitests for FortranRuntime and Flang. A 
new `CMakeLists.txt` was introduced for the FortranRuntime tests.
 * Change the of the `#include` paths relative to the include directive
 * clang-format on the `#include` directives
 * Since the paths is part if the copyright header and include guards, a script 
was used to canonicalize those
 * `test/Runtime` and runtime tests in `test/Driver` are moved, but the 
lit.cfg.py mechanism to execute the will only be added in #110217.

---

Patch is 334.25 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/110298.diff


418 Files Affected:

- (added) FortranRuntime/.clang-format (+21) 
- (renamed) FortranRuntime/cmake/config.h.cmake.in () 
- (renamed) FortranRuntime/include/flang/Common/Fortran-consts.h () 
- (renamed) FortranRuntime/include/flang/Common/ISO_Fortran_binding_wrapper.h 
(+7-8) 
- (renamed) FortranRuntime/include/flang/Common/api-attrs.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/binary-floating-point.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Common/bit-population-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/constexpr-bitset.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/decimal.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/enum-class.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/enum-set.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/fast-int-set.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/float128.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/format.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/idioms.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/leading-zero-bit-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/magic-numbers.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/optional.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/real.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/reference-wrapper.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/restorer.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/target-rounding.h () 
- (renamed) FortranRuntime/include/flang/Common/uint128.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/variant.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/visit.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/windows-include.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/allocator.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/descriptor.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocatable.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocator-registry.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Runtime/array-constructor.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Runtime/assign.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/c-or-cpp.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/character.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/command.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/cpp-type.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/derived-api.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/descriptor.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/entry-names.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Runtime/exceptions.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/execute.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/extensions.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/freestanding-tools.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/inquiry.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/io-api.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/iostat.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/main.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-instances.inc () 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-transpose.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/memory.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/misc

[llvm-branch-commits] [clang] [flang] [lld] [llvm] [Flang][DRAFT] LLVM_ENABLE_RUNTIMES=FortranRuntime (PR #110217)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/110217
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [flang] [llvm] [Flang] Move runtime library files to FortranRuntime. NFC (PR #110298)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-flang-fir-hlfir

Author: Michael Kruse (Meinersbur)


Changes

Mostly mechanical changes in preparation of extracting the FortranRuntime 
"subproject" in #110217. This PR intends to only move pre-existing 
files to the new folder structure, with no behavioral change.

`Common` and `Testing` are the directories shared by FortranRuntime and Flang. 
`Runtime` and `module` are going to be used by FortranRuntime only. Files in 
`Common` that are used only by Flang are moved into `Support`.

Some cosmetic changes and files paths were necessary:
 * Relative paths to the new path for the source files and `add_subdirectory`.
 * Add the new location's include directory to `include_directories`
 * The unittest/Evaluate directory has unitests for FortranRuntime and Flang. A 
new `CMakeLists.txt` was introduced for the FortranRuntime tests.
 * Change the of the `#include` paths relative to the include directive
 * clang-format on the `#include` directives
 * Since the paths is part if the copyright header and include guards, a script 
was used to canonicalize those
 * `test/Runtime` and runtime tests in `test/Driver` are moved, but the 
lit.cfg.py mechanism to execute the will only be added in #110217.

---

Patch is 334.25 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/110298.diff


418 Files Affected:

- (added) FortranRuntime/.clang-format (+21) 
- (renamed) FortranRuntime/cmake/config.h.cmake.in () 
- (renamed) FortranRuntime/include/flang/Common/Fortran-consts.h () 
- (renamed) FortranRuntime/include/flang/Common/ISO_Fortran_binding_wrapper.h 
(+7-8) 
- (renamed) FortranRuntime/include/flang/Common/api-attrs.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/binary-floating-point.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Common/bit-population-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/constexpr-bitset.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/decimal.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Common/enum-class.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/enum-set.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/fast-int-set.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/float128.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/format.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/idioms.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/leading-zero-bit-count.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/magic-numbers.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/optional.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/real.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Common/reference-wrapper.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/restorer.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/target-rounding.h () 
- (renamed) FortranRuntime/include/flang/Common/uint128.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/variant.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Common/visit.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Common/windows-include.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/allocator.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/CUDA/descriptor.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocatable.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/allocator-registry.h (+4-4) 
- (renamed) FortranRuntime/include/flang/Runtime/array-constructor.h (+3-3) 
- (renamed) FortranRuntime/include/flang/Runtime/assign.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/c-or-cpp.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/character.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/command.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/cpp-type.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/derived-api.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/descriptor.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/entry-names.h (+5-6) 
- (renamed) FortranRuntime/include/flang/Runtime/exceptions.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/execute.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/extensions.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/freestanding-tools.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/inquiry.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/io-api.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/iostat.h (+2-2) 
- (renamed) FortranRuntime/include/flang/Runtime/main.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-instances.inc () 
- (renamed) FortranRuntime/include/flang/Runtime/matmul-transpose.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/matmul.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/memory.h (+1-1) 
- (renamed) FortranRuntime/include/flang/Runtime/m

[llvm-branch-commits] [clang] [flang] [lld] [llvm] [Flang] LLVM_ENABLE_RUNTIMES=FortranRuntime (PR #110217)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur edited 
https://github.com/llvm/llvm-project/pull/110217
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [flang] [lld] [llvm] [Flang][DRAFT] LLVM_ENABLE_RUNTIMES=FortranRuntime (PR #110217)

2024-10-07 Thread Michael Kruse via llvm-branch-commits

https://github.com/Meinersbur ready_for_review 
https://github.com/llvm/llvm-project/pull/110217
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for cmpxchg custom expansion (PR #109408)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/109408

>From b69ead1780d39a41821fd0cd65326fba5c58673f Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 12 Sep 2024 12:44:04 +0400
Subject: [PATCH] AMDGPU: Add baseline tests for cmpxchg custom expansion

We need a non-atomic path if flat may access private.
---
 .../AMDGPU/flat_atomics_i64_noprivate.ll  |  34 +--
 .../AtomicExpand/AMDGPU/expand-atomic-mmra.ll |  12 +-
 ...and-atomic-rmw-fadd-flat-specialization.ll |   4 +-
 ...expand-atomicrmw-flat-noalias-addrspace.ll | 149 -
 .../expand-cmpxchg-flat-maybe-private.ll  | 208 ++
 5 files changed, 382 insertions(+), 25 deletions(-)
 create mode 100644 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-cmpxchg-flat-maybe-private.ll

diff --git a/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll 
b/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
index e73841e0987800..4072b36e96cf40 100644
--- a/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
@@ -5005,7 +5005,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_offset(ptr 
%out, i64 %in, i64 %old
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5061,7 +5061,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_soffset(ptr 
%out, i64 %in, i64 %ol
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 9000
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5121,7 +5121,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_offset(ptr %out, ptr %out2, i6
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5184,7 +5184,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_addr64_offset(ptr %out, i64 %in, i
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
   %gep = getelementptr i64, ptr %ptr, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5257,7 +5257,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_addr64_offset(ptr %out, ptr %o
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
   %gep = getelementptr i64, ptr %ptr, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5310,7 +5310,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64(ptr %out, 
i64 %in, i64 %old) {
 ; GFX12-NEXT:global_inv scope:SCOPE_DEV
 ; GFX12-NEXT:s_endpgm
 entry:
-  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5365,7 +5365,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_ret(ptr 
%out, ptr %out2, i64 %in,
 ; GFX12-NEXT:flat_store_b64 v[2:3], v[0:1]
 ; GFX12-NEXT:s_endpgm
 entry:
-  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5423,7 +5423,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_addr64(ptr 
%out, i64 %in, i64 %ind
 ; GFX12-NEXT:s_endpgm
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
-  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5491,7 +5491,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_addr64(ptr %out, ptr %out2, i6
 ; GFX12-NEXT:s_endpgm
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
-  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst se

[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/109410

>From 6c633b7f9acbc000dac2df4ac2eab630eb16e188 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 14 Aug 2024 13:57:14 +0400
Subject: [PATCH 1/2] AMDGPU: Custom expand flat cmpxchg which may access
 private

64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.

Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|5 +
 .../llvm/Transforms/Utils/LowerAtomic.h   |7 +
 llvm/lib/CodeGen/AtomicExpandPass.cpp |4 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  146 ++-
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |3 +
 llvm/lib/Transforms/Utils/LowerAtomic.cpp |   21 +-
 llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll  | 1019 +++--
 ...expand-atomicrmw-flat-noalias-addrspace.ll |6 +-
 ...expand-atomicrmw-integer-ops-0-to-add-0.ll |6 +-
 .../expand-cmpxchg-flat-maybe-private.ll  |  104 +-
 10 files changed, 1157 insertions(+), 164 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 4c76592c42e1eb..a4be39d43e38e7 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -2204,6 +2204,11 @@ class TargetLoweringBase {
 "Generic atomicrmw expansion unimplemented on this target");
   }
 
+  /// Perform a cmpxchg expansion using a target-specific method.
+  virtual void emitExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) const {
+llvm_unreachable("Generic cmpxchg expansion unimplemented on this target");
+  }
+
   /// Perform a bit test atomicrmw using a target-specific intrinsic. This
   /// represents the combined bit test intrinsic which will be lowered at a 
late
   /// stage by the backend.
diff --git a/llvm/include/llvm/Transforms/Utils/LowerAtomic.h 
b/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
index b25b281667f9cb..295c2bd2b4b47e 100644
--- a/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
+++ b/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
@@ -23,6 +23,13 @@ class IRBuilderBase;
 /// Convert the given Cmpxchg into primitive load and compare.
 bool lowerAtomicCmpXchgInst(AtomicCmpXchgInst *CXI);
 
+/// Emit IR to implement the given cmpxchg operation on values in registers,
+/// returning the new value.
+std::pair buildAtomicCmpXchgValue(IRBuilderBase &Builder,
+Value *Ptr, Value *Cmp,
+Value *Val,
+Align Alignment);
+
 /// Convert the given RMWI into primitive load and stores,
 /// assuming that doing so is legal. Return true if the lowering
 /// succeeds.
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index b5eca44cb611a3..71e0fd2b7167a2 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1672,6 +1672,10 @@ bool 
AtomicExpandImpl::tryExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) {
 return true;
   case TargetLoweringBase::AtomicExpansionKind::NotAtomic:
 return lowerAtomicCmpXchgInst(CI);
+  case TargetLoweringBase::AtomicExpansionKind::Expand: {
+TLI->emitExpandAtomicCmpXchg(CI);
+return true;
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5e72c1bb82be63..9d1e807da1b233 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16588,9 +16588,21 @@ 
SITargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
 
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CmpX) const 
{
-  return CmpX->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
- : AtomicExpansionKind::None;
+  unsigned AddrSpace = CmpX->getPointerAddressSpace();
+  if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)
+return AtomicExpansionKind::NotAtomic;
+
+  if (AddrSpace != AMDGPUAS::FLAT_ADDRESS || !flatInstrMayAccessPrivate(CmpX))
+return AtomicExpansionKind::None;
+
+  const DataLayout &DL = CmpX->getDataLayout();
+
+  Type *ValTy = CmpX->getNewValOperand()->getType();
+
+  // If a 64-bit flat atomic may alias private, we need to avoid using the
+  // atomic in the private case.
+  return DL.getTypeSizeInBits(ValTy) == 64 ? AtomicExpansionKind::Expand
+   : AtomicExpansionKind::None;
 }
 
 const TargetRegisterClass *
@@ -16754,40 +16766,8 @@ bool SITargetLowering::checkForPhysRegDependency(
   return false;
 }
 
-void SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
-  AtomicRMWInst::BinOp Op = AI->getOperation();
-
-  if (Op == AtomicRMWInst::Sub || Op ==

[llvm-branch-commits] [clang] [clang] WIP: Implement TTP 'reversed' pack matching for deduced function template calls. (PR #111457)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-clang

Author: Matheus Izvekov (mizvekov)


Changes

Clang previously missed implementing the historical rule 
https://eel.is/c++draft/temp.arg.template#3.sentence-3 for deduced 
function template calls.

This patch implements this rule, but only on the
'frelaxed-template-template-args' mode, which is
currently the default.

As its negation is deprecated and will be removed soon, this patch does not 
change the implementation in that case.

WIP, as it's missing some changes which will help in not breaking compatibility 
in overload resolution.

---

Patch is 20.92 KiB, truncated to 20.00 KiB below, full version: 
https://github.com/llvm/llvm-project/pull/111457.diff


6 Files Affected:

- (modified) clang/include/clang/Sema/Sema.h (+8-6) 
- (modified) clang/lib/Sema/SemaLookup.cpp (+1) 
- (modified) clang/lib/Sema/SemaOverload.cpp (+3-2) 
- (modified) clang/lib/Sema/SemaTemplate.cpp (+12-11) 
- (modified) clang/lib/Sema/SemaTemplateDeduction.cpp (+44-32) 
- (modified) clang/test/SemaTemplate/cwg2398.cpp (+34) 


``diff
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 5d38862ce59f0c..05857884fdc2e1 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -11636,7 +11636,7 @@ class Sema final : public SemaBase {
 SourceLocation RAngleLoc, unsigned ArgumentPackIndex,
 SmallVectorImpl &SugaredConverted,
 SmallVectorImpl &CanonicalConverted,
-CheckTemplateArgumentKind CTAK,
+CheckTemplateArgumentKind CTAK, bool PartialOrdering,
 bool *MatchedPackOnParmToNonPackOnArg);
 
   /// Check that the given template arguments can be provided to
@@ -11719,7 +11719,8 @@ class Sema final : public SemaBase {
   /// It returns true if an error occurred, and false otherwise.
   bool CheckTemplateTemplateArgument(TemplateTemplateParmDecl *Param,
  TemplateParameterList *Params,
- TemplateArgumentLoc &Arg, bool IsDeduced,
+ TemplateArgumentLoc &Arg,
+ bool PartialOrdering,
  bool *MatchedPackOnParmToNonPackOnArg);
 
   void NoteTemplateLocation(const NamedDecl &Decl,
@@ -12231,8 +12232,8 @@ class Sema final : public SemaBase {
   SmallVectorImpl &Deduced,
   unsigned NumExplicitlySpecified, FunctionDecl *&Specialization,
   sema::TemplateDeductionInfo &Info,
-  SmallVectorImpl const *OriginalCallArgs = nullptr,
-  bool PartialOverloading = false,
+  SmallVectorImpl const *OriginalCallArgs,
+  bool PartialOverloading, bool PartialOrdering,
   llvm::function_ref CheckNonDependent = [] { return false; });
 
   /// Perform template argument deduction from a function call
@@ -12266,7 +12267,8 @@ class Sema final : public SemaBase {
   TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef Args,
   FunctionDecl *&Specialization, sema::TemplateDeductionInfo &Info,
   bool PartialOverloading, bool AggregateDeductionCandidate,
-  QualType ObjectType, Expr::Classification ObjectClassification,
+  bool PartialOrdering, QualType ObjectType,
+  Expr::Classification ObjectClassification,
   llvm::function_ref)> CheckNonDependent);
 
   /// Deduce template arguments when taking the address of a function
@@ -12421,7 +12423,7 @@ class Sema final : public SemaBase {
   bool isTemplateTemplateParameterAtLeastAsSpecializedAs(
   TemplateParameterList *PParam, TemplateDecl *PArg, TemplateDecl *AArg,
   const DefaultArguments &DefaultArgs, SourceLocation ArgLoc,
-  bool IsDeduced, bool *MatchedPackOnParmToNonPackOnArg);
+  bool PartialOrdering, bool *MatchedPackOnParmToNonPackOnArg);
 
   /// Mark which template parameters are used in a given expression.
   ///
diff --git a/clang/lib/Sema/SemaLookup.cpp b/clang/lib/Sema/SemaLookup.cpp
index 31422c213ac249..60fa195221c938 100644
--- a/clang/lib/Sema/SemaLookup.cpp
+++ b/clang/lib/Sema/SemaLookup.cpp
@@ -3667,6 +3667,7 @@ Sema::LookupLiteralOperator(Scope *S, LookupResult &R,
   if (CheckTemplateArgument(
   Params->getParam(0), Arg, FD, R.getNameLoc(), R.getNameLoc(),
   0, SugaredChecked, CanonicalChecked, CTAK_Specified,
+  /*PartialOrdering=*/false,
   /*MatchedPackOnParmToNonPackOnArg=*/nullptr) ||
   Trap.hasErrorOccurred())
 IsTemplate = false;
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 2cde8131108fbe..a8c3dd05a9a3c5 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -7663,8 +7663,8 @@ void Sema::AddMethodTemplateCandidate(
   ConversionSequenceList Conversions;
   if (TemplateDeductionResult Result = DeduceTemplateArgu

[llvm-branch-commits] [clang] [clang] WIP: Implement TTP 'reversed' pack matching for deduced function template calls. (PR #111457)

2024-10-07 Thread Matheus Izvekov via llvm-branch-commits

https://github.com/mizvekov created 
https://github.com/llvm/llvm-project/pull/111457

Clang previously missed implementing the historical rule 
https://eel.is/c++draft/temp.arg.template#3.sentence-3 for deduced function 
template calls.

This patch implements this rule, but only on the
'frelaxed-template-template-args' mode, which is
currently the default.

As its negation is deprecated and will be removed soon, this patch does not 
change the implementation in that case.

WIP, as it's missing some changes which will help in not breaking compatibility 
in overload resolution.

>From 1e8fbca963bb13d38ee6f9907ca8e5039bd08a8d Mon Sep 17 00:00:00 2001
From: Matheus Izvekov 
Date: Sat, 5 Oct 2024 21:56:51 -0300
Subject: [PATCH] [clang] WIP: Implement TTP 'reversed' pack matching for
 deduced function template calls.

Clang previously missed implementing the historical rule
https://eel.is/c++draft/temp.arg.template#3.sentence-3
for deduced function template calls.

This patch implements this rule, but only on the
'frelaxed-template-template-args' patch, which is
currently the default mode.

As it's negation is deprecated and will be removed soon,
this patch does not change the implementation there.

WIP, as it's missing some changes which will help in not breaking
compatibility in overload resolution.
---
 clang/include/clang/Sema/Sema.h  | 14 +++--
 clang/lib/Sema/SemaLookup.cpp|  1 +
 clang/lib/Sema/SemaOverload.cpp  |  5 +-
 clang/lib/Sema/SemaTemplate.cpp  | 23 +++
 clang/lib/Sema/SemaTemplateDeduction.cpp | 76 ++--
 clang/test/SemaTemplate/cwg2398.cpp  | 34 +++
 6 files changed, 102 insertions(+), 51 deletions(-)

diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 5d38862ce59f0c..05857884fdc2e1 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -11636,7 +11636,7 @@ class Sema final : public SemaBase {
 SourceLocation RAngleLoc, unsigned ArgumentPackIndex,
 SmallVectorImpl &SugaredConverted,
 SmallVectorImpl &CanonicalConverted,
-CheckTemplateArgumentKind CTAK,
+CheckTemplateArgumentKind CTAK, bool PartialOrdering,
 bool *MatchedPackOnParmToNonPackOnArg);
 
   /// Check that the given template arguments can be provided to
@@ -11719,7 +11719,8 @@ class Sema final : public SemaBase {
   /// It returns true if an error occurred, and false otherwise.
   bool CheckTemplateTemplateArgument(TemplateTemplateParmDecl *Param,
  TemplateParameterList *Params,
- TemplateArgumentLoc &Arg, bool IsDeduced,
+ TemplateArgumentLoc &Arg,
+ bool PartialOrdering,
  bool *MatchedPackOnParmToNonPackOnArg);
 
   void NoteTemplateLocation(const NamedDecl &Decl,
@@ -12231,8 +12232,8 @@ class Sema final : public SemaBase {
   SmallVectorImpl &Deduced,
   unsigned NumExplicitlySpecified, FunctionDecl *&Specialization,
   sema::TemplateDeductionInfo &Info,
-  SmallVectorImpl const *OriginalCallArgs = nullptr,
-  bool PartialOverloading = false,
+  SmallVectorImpl const *OriginalCallArgs,
+  bool PartialOverloading, bool PartialOrdering,
   llvm::function_ref CheckNonDependent = [] { return false; });
 
   /// Perform template argument deduction from a function call
@@ -12266,7 +12267,8 @@ class Sema final : public SemaBase {
   TemplateArgumentListInfo *ExplicitTemplateArgs, ArrayRef Args,
   FunctionDecl *&Specialization, sema::TemplateDeductionInfo &Info,
   bool PartialOverloading, bool AggregateDeductionCandidate,
-  QualType ObjectType, Expr::Classification ObjectClassification,
+  bool PartialOrdering, QualType ObjectType,
+  Expr::Classification ObjectClassification,
   llvm::function_ref)> CheckNonDependent);
 
   /// Deduce template arguments when taking the address of a function
@@ -12421,7 +12423,7 @@ class Sema final : public SemaBase {
   bool isTemplateTemplateParameterAtLeastAsSpecializedAs(
   TemplateParameterList *PParam, TemplateDecl *PArg, TemplateDecl *AArg,
   const DefaultArguments &DefaultArgs, SourceLocation ArgLoc,
-  bool IsDeduced, bool *MatchedPackOnParmToNonPackOnArg);
+  bool PartialOrdering, bool *MatchedPackOnParmToNonPackOnArg);
 
   /// Mark which template parameters are used in a given expression.
   ///
diff --git a/clang/lib/Sema/SemaLookup.cpp b/clang/lib/Sema/SemaLookup.cpp
index 31422c213ac249..60fa195221c938 100644
--- a/clang/lib/Sema/SemaLookup.cpp
+++ b/clang/lib/Sema/SemaLookup.cpp
@@ -3667,6 +3667,7 @@ Sema::LookupLiteralOperator(Scope *S, LookupResult &R,
   if (CheckTemplateArgument(
   Params->getParam(0), Arg, FD, R.getNameLoc(), 

[llvm-branch-commits] [clang] [clang] WIP: Implement TTP 'reversed' pack matching for deduced function template calls. (PR #111457)

2024-10-07 Thread Matheus Izvekov via llvm-branch-commits

https://github.com/mizvekov edited 
https://github.com/llvm/llvm-project/pull/111457
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [MSVC] work-around for compile time issue 102513 (PR #111314)

2024-10-07 Thread Tobias Hieta via llvm-branch-commits

tru wrote:

> @tru I'm unable to merge this even with approval. Perhaps this has been made 
> against the wrong branch or 19.x now closed?

No it's correct. Only the release manager can merge to the release branch, I 
will do that before the next release.

https://github.com/llvm/llvm-project/pull/111314
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [x86][Windows] Fix chromium build break (PR #111218)

2024-10-07 Thread Tobias Hieta via llvm-branch-commits

tru wrote:

I will merge it before then next release. Only the release managers can merge 
to the release branch.

https://github.com/llvm/llvm-project/pull/111218
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [MSVC] work-around for compile time issue 102513 (PR #111314)

2024-10-07 Thread via llvm-branch-commits

bd1976bris wrote:

@tru I'm unable to merge this even with approval. Perhaps this has been made 
against the wrong branch or 19.x now closed?

https://github.com/llvm/llvm-project/pull/111314
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] AMDGPU: Custom expand flat cmpxchg which may access private (PR #109410)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/109410

>From d1c280fc9477f6dc9d8d88b223523cc941f504e6 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 14 Aug 2024 13:57:14 +0400
Subject: [PATCH 1/2] AMDGPU: Custom expand flat cmpxchg which may access
 private

64-bit flat cmpxchg instructions do not work correctly for scratch
addresses, and need to be expanded as non-atomic.

Allow custom expansion of cmpxchg in AtomicExpand, as is
already the case for atomicrmw.
---
 llvm/include/llvm/CodeGen/TargetLowering.h|5 +
 .../llvm/Transforms/Utils/LowerAtomic.h   |7 +
 llvm/lib/CodeGen/AtomicExpandPass.cpp |4 +
 llvm/lib/Target/AMDGPU/SIISelLowering.cpp |  146 ++-
 llvm/lib/Target/AMDGPU/SIISelLowering.h   |3 +
 llvm/lib/Transforms/Utils/LowerAtomic.cpp |   21 +-
 llvm/test/CodeGen/AMDGPU/flat_atomics_i64.ll  | 1019 +++--
 ...expand-atomicrmw-flat-noalias-addrspace.ll |6 +-
 ...expand-atomicrmw-integer-ops-0-to-add-0.ll |6 +-
 .../expand-cmpxchg-flat-maybe-private.ll  |  104 +-
 10 files changed, 1157 insertions(+), 164 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h 
b/llvm/include/llvm/CodeGen/TargetLowering.h
index 3842af56e6b3d7..678b169568afcf 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -2204,6 +2204,11 @@ class TargetLoweringBase {
 "Generic atomicrmw expansion unimplemented on this target");
   }
 
+  /// Perform a cmpxchg expansion using a target-specific method.
+  virtual void emitExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) const {
+llvm_unreachable("Generic cmpxchg expansion unimplemented on this target");
+  }
+
   /// Perform a bit test atomicrmw using a target-specific intrinsic. This
   /// represents the combined bit test intrinsic which will be lowered at a 
late
   /// stage by the backend.
diff --git a/llvm/include/llvm/Transforms/Utils/LowerAtomic.h 
b/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
index b25b281667f9cb..295c2bd2b4b47e 100644
--- a/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
+++ b/llvm/include/llvm/Transforms/Utils/LowerAtomic.h
@@ -23,6 +23,13 @@ class IRBuilderBase;
 /// Convert the given Cmpxchg into primitive load and compare.
 bool lowerAtomicCmpXchgInst(AtomicCmpXchgInst *CXI);
 
+/// Emit IR to implement the given cmpxchg operation on values in registers,
+/// returning the new value.
+std::pair buildAtomicCmpXchgValue(IRBuilderBase &Builder,
+Value *Ptr, Value *Cmp,
+Value *Val,
+Align Alignment);
+
 /// Convert the given RMWI into primitive load and stores,
 /// assuming that doing so is legal. Return true if the lowering
 /// succeeds.
diff --git a/llvm/lib/CodeGen/AtomicExpandPass.cpp 
b/llvm/lib/CodeGen/AtomicExpandPass.cpp
index b5eca44cb611a3..71e0fd2b7167a2 100644
--- a/llvm/lib/CodeGen/AtomicExpandPass.cpp
+++ b/llvm/lib/CodeGen/AtomicExpandPass.cpp
@@ -1672,6 +1672,10 @@ bool 
AtomicExpandImpl::tryExpandAtomicCmpXchg(AtomicCmpXchgInst *CI) {
 return true;
   case TargetLoweringBase::AtomicExpansionKind::NotAtomic:
 return lowerAtomicCmpXchgInst(CI);
+  case TargetLoweringBase::AtomicExpansionKind::Expand: {
+TLI->emitExpandAtomicCmpXchg(CI);
+return true;
+  }
   }
 }
 
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp 
b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 5e72c1bb82be63..9d1e807da1b233 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16588,9 +16588,21 @@ 
SITargetLowering::shouldExpandAtomicStoreInIR(StoreInst *SI) const {
 
 TargetLowering::AtomicExpansionKind
 SITargetLowering::shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *CmpX) const 
{
-  return CmpX->getPointerAddressSpace() == AMDGPUAS::PRIVATE_ADDRESS
- ? AtomicExpansionKind::NotAtomic
- : AtomicExpansionKind::None;
+  unsigned AddrSpace = CmpX->getPointerAddressSpace();
+  if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS)
+return AtomicExpansionKind::NotAtomic;
+
+  if (AddrSpace != AMDGPUAS::FLAT_ADDRESS || !flatInstrMayAccessPrivate(CmpX))
+return AtomicExpansionKind::None;
+
+  const DataLayout &DL = CmpX->getDataLayout();
+
+  Type *ValTy = CmpX->getNewValOperand()->getType();
+
+  // If a 64-bit flat atomic may alias private, we need to avoid using the
+  // atomic in the private case.
+  return DL.getTypeSizeInBits(ValTy) == 64 ? AtomicExpansionKind::Expand
+   : AtomicExpansionKind::None;
 }
 
 const TargetRegisterClass *
@@ -16754,40 +16766,8 @@ bool SITargetLowering::checkForPhysRegDependency(
   return false;
 }
 
-void SITargetLowering::emitExpandAtomicRMW(AtomicRMWInst *AI) const {
-  AtomicRMWInst::BinOp Op = AI->getOperation();
-
-  if (Op == AtomicRMWInst::Sub || Op ==

[llvm-branch-commits] [llvm] AMDGPU: Add baseline tests for cmpxchg custom expansion (PR #109408)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/109408

>From 4934c7dfffd890420a433c30e6375f42bfa76596 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Thu, 12 Sep 2024 12:44:04 +0400
Subject: [PATCH] AMDGPU: Add baseline tests for cmpxchg custom expansion

We need a non-atomic path if flat may access private.
---
 .../AMDGPU/flat_atomics_i64_noprivate.ll  |  34 +--
 .../AtomicExpand/AMDGPU/expand-atomic-mmra.ll |  12 +-
 ...and-atomic-rmw-fadd-flat-specialization.ll |   4 +-
 ...expand-atomicrmw-flat-noalias-addrspace.ll | 149 -
 .../expand-cmpxchg-flat-maybe-private.ll  | 208 ++
 5 files changed, 382 insertions(+), 25 deletions(-)
 create mode 100644 
llvm/test/Transforms/AtomicExpand/AMDGPU/expand-cmpxchg-flat-maybe-private.ll

diff --git a/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll 
b/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
index e73841e0987800..4072b36e96cf40 100644
--- a/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat_atomics_i64_noprivate.ll
@@ -5005,7 +5005,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_offset(ptr 
%out, i64 %in, i64 %old
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5061,7 +5061,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_soffset(ptr 
%out, i64 %in, i64 %ol
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 9000
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5121,7 +5121,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_offset(ptr %out, ptr %out2, i6
 ; GFX12-NEXT:s_endpgm
 entry:
   %gep = getelementptr i64, ptr %out, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5184,7 +5184,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_addr64_offset(ptr %out, i64 %in, i
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
   %gep = getelementptr i64, ptr %ptr, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5257,7 +5257,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_addr64_offset(ptr %out, ptr %o
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
   %gep = getelementptr i64, ptr %ptr, i64 4
-  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %gep, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5310,7 +5310,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64(ptr %out, 
i64 %in, i64 %old) {
 ; GFX12-NEXT:global_inv scope:SCOPE_DEV
 ; GFX12-NEXT:s_endpgm
 entry:
-  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5365,7 +5365,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_ret(ptr 
%out, ptr %out2, i64 %in,
 ; GFX12-NEXT:flat_store_b64 v[2:3], v[0:1]
 ; GFX12-NEXT:s_endpgm
 entry:
-  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %out, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   %extract0 = extractvalue { i64, i1 } %val, 0
   store i64 %extract0, ptr %out2
   ret void
@@ -5423,7 +5423,7 @@ define amdgpu_kernel void @atomic_cmpxchg_i64_addr64(ptr 
%out, i64 %in, i64 %ind
 ; GFX12-NEXT:s_endpgm
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
-  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst, !noalias.addrspace !0
   ret void
 }
 
@@ -5491,7 +5491,7 @@ define amdgpu_kernel void 
@atomic_cmpxchg_i64_ret_addr64(ptr %out, ptr %out2, i6
 ; GFX12-NEXT:s_endpgm
 entry:
   %ptr = getelementptr i64, ptr %out, i64 %index
-  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst seq_cst
+  %val = cmpxchg volatile ptr %ptr, i64 %old, i64 %in syncscope("agent") 
seq_cst se

[llvm-branch-commits] [llvm] AMDGPU: Add noalias.addrspace metadata when autoupgrading atomic intrinsics (PR #102599)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/102599

>From 3415e01b8a510a3750ea55767c3f206c4e1a3e61 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Fri, 9 Aug 2024 14:51:41 +0400
Subject: [PATCH] AMDGPU: Add noalias.addrspace metadata when autoupgrading
 atomic intrinsics

This will be needed to continue generating the raw instruction in the flat case.
---
 llvm/lib/IR/AutoUpgrade.cpp| 13 -
 llvm/test/Bitcode/amdgcn-atomic.ll | 45 --
 2 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index e469c2ae52eb72..3753509f9aa718 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -34,9 +34,11 @@
 #include "llvm/IR/IntrinsicsWebAssembly.h"
 #include "llvm/IR/IntrinsicsX86.h"
 #include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/MDBuilder.h"
 #include "llvm/IR/Metadata.h"
 #include "llvm/IR/Module.h"
 #include "llvm/IR/Verifier.h"
+#include "llvm/Support/AMDGPUAddrSpace.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/ErrorHandling.h"
 #include "llvm/Support/Regex.h"
@@ -4270,13 +4272,22 @@ static Value *upgradeAMDGCNIntrinsicCall(StringRef 
Name, CallBase *CI,
   AtomicRMWInst *RMW =
   Builder.CreateAtomicRMW(RMWOp, Ptr, Val, std::nullopt, Order, SSID);
 
-  if (PtrTy->getAddressSpace() != 3) {
+  unsigned AddrSpace = PtrTy->getAddressSpace();
+  if (AddrSpace != AMDGPUAS::LOCAL_ADDRESS) {
 MDNode *EmptyMD = MDNode::get(F->getContext(), {});
 RMW->setMetadata("amdgpu.no.fine.grained.memory", EmptyMD);
 if (RMWOp == AtomicRMWInst::FAdd && RetTy->isFloatTy())
   RMW->setMetadata("amdgpu.ignore.denormal.mode", EmptyMD);
   }
 
+  if (AddrSpace == AMDGPUAS::FLAT_ADDRESS) {
+MDBuilder MDB(F->getContext());
+MDNode *RangeNotPrivate =
+MDB.createRange(APInt(32, AMDGPUAS::PRIVATE_ADDRESS),
+APInt(32, AMDGPUAS::PRIVATE_ADDRESS + 1));
+RMW->setMetadata(LLVMContext::MD_noalias_addrspace, RangeNotPrivate);
+  }
+
   if (IsVolatile)
 RMW->setVolatile(true);
 
diff --git a/llvm/test/Bitcode/amdgcn-atomic.ll 
b/llvm/test/Bitcode/amdgcn-atomic.ll
index d642372799f56b..87ca1e3a617ed9 100644
--- a/llvm/test/Bitcode/amdgcn-atomic.ll
+++ b/llvm/test/Bitcode/amdgcn-atomic.ll
@@ -2,10 +2,10 @@
 
 
 define void @atomic_inc(ptr %ptr0, ptr addrspace(1) %ptr1, ptr addrspace(3) 
%ptr3) {
-  ; CHECK: atomicrmw uinc_wrap ptr %ptr0, i32 42 syncscope("agent") seq_cst, 
align 4, !amdgpu.no.fine.grained.memory !0
+  ; CHECK: atomicrmw uinc_wrap ptr %ptr0, i32 42 syncscope("agent") seq_cst, 
align 4, !noalias.addrspace !0, !amdgpu.no.fine.grained.memory !1{{$}}
   %result0 = call i32 @llvm.amdgcn.atomic.inc.i32.p0(ptr %ptr0, i32 42, i32 0, 
i32 0, i1 false)
 
-  ; CHECK: atomicrmw uinc_wrap ptr addrspace(1) %ptr1, i32 43 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !0
+  ; CHECK: atomicrmw uinc_wrap ptr addrspace(1) %ptr1, i32 43 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !1
   %result1 = call i32 @llvm.amdgcn.atomic.inc.i32.p1(ptr addrspace(1) %ptr1, 
i32 43, i32 0, i32 0, i1 false)
 
   ; CHECK: atomicrmw uinc_wrap ptr addrspace(3) %ptr3, i32 46 
syncscope("agent") seq_cst, align 4{{$}}
@@ -26,10 +26,10 @@ define void @atomic_inc(ptr %ptr0, ptr addrspace(1) %ptr1, 
ptr addrspace(3) %ptr
 }
 
 define void @atomic_dec(ptr %ptr0, ptr addrspace(1) %ptr1, ptr addrspace(3) 
%ptr3) {
-  ; CHECK: atomicrmw udec_wrap ptr %ptr0, i32 42 syncscope("agent") seq_cst, 
align 4, !amdgpu.no.fine.grained.memory !0
+  ; CHECK: atomicrmw udec_wrap ptr %ptr0, i32 42 syncscope("agent") seq_cst, 
align 4, !noalias.addrspace !0, !amdgpu.no.fine.grained.memory !1{{$}}
   %result0 = call i32 @llvm.amdgcn.atomic.dec.i32.p0(ptr %ptr0, i32 42, i32 0, 
i32 0, i1 false)
 
-  ; CHECK: atomicrmw udec_wrap ptr addrspace(1) %ptr1, i32 43 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !0
+  ; CHECK: atomicrmw udec_wrap ptr addrspace(1) %ptr1, i32 43 
syncscope("agent") seq_cst, align 4, !amdgpu.no.fine.grained.memory !1
   %result1 = call i32 @llvm.amdgcn.atomic.dec.i32.p1(ptr addrspace(1) %ptr1, 
i32 43, i32 0, i32 0, i1 false)
 
   ; CHECK: atomicrmw udec_wrap ptr addrspace(3) %ptr3, i32 46 
syncscope("agent") seq_cst, align 4{{$}}
@@ -51,49 +51,49 @@ define void @atomic_dec(ptr %ptr0, ptr addrspace(1) %ptr1, 
ptr addrspace(3) %ptr
 
 ; Test some invalid ordering handling
 define void @ordering(ptr %ptr0, ptr addrspace(1) %ptr1, ptr addrspace(3) 
%ptr3) {
-  ; CHECK: atomicrmw volatile uinc_wrap ptr %ptr0, i32 42 syncscope("agent") 
seq_cst, align 4, !amdgpu.no.fine.grained.memory !0
+  ; CHECK: atomicrmw volatile uinc_wrap ptr %ptr0, i32 42 syncscope("agent") 
seq_cst, align 4, !noalias.addrspace !0, !amdgpu.no.fine.grained.memory !1{{$}}
   %result0 = call i32 @llvm.amdgcn.atomic.inc.i32.p0(ptr %ptr0, i32 42, i32 
-1, i32 0, i1 true)
 
-  ; CHECK:

[llvm-branch-commits] [llvm] [NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (PR #109939)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109939

>From 786fb970b7b1d12a6c6c6888d2b5cfe51363287d Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 11:41:18 +
Subject: [PATCH 1/2] [NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  6 +-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  7 ++-
 .../Target/AMDGPU/SIPreAllocateWWMRegs.cpp| 60 ---
 llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h | 25 
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 20 +++
 6 files changed, 92 insertions(+), 27 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..95d0ad0f9dc96a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -49,7 +49,7 @@ FunctionPass *createSIFixSGPRCopiesLegacyPass();
 FunctionPass *createLowerWWMCopiesPass();
 FunctionPass *createSIMemoryLegalizerPass();
 FunctionPass *createSIInsertWaitcntsPass();
-FunctionPass *createSIPreAllocateWWMRegsPass();
+FunctionPass *createSIPreAllocateWWMRegsLegacyPass();
 FunctionPass *createSIFormMemoryClausesPass();
 
 FunctionPass *createSIPostRABundlerPass();
@@ -212,8 +212,8 @@ extern char &SILateBranchLoweringPassID;
 void initializeSIOptimizeExecMaskingPass(PassRegistry &);
 extern char &SIOptimizeExecMaskingID;
 
-void initializeSIPreAllocateWWMRegsPass(PassRegistry &);
-extern char &SIPreAllocateWWMRegsID;
+void initializeSIPreAllocateWWMRegsLegacyPass(PassRegistry &);
+extern char &SIPreAllocateWWMRegsLegacyID;
 
 void initializeAMDGPUImageIntrinsicOptimizerPass(PassRegistry &);
 extern char &AMDGPUImageIntrinsicOptimizerID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 0ebf34c901c142..174a90f0aa419d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,5 +102,6 @@ MACHINE_FUNCTION_PASS("gcn-dpp-combine", 
GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
+MACHINE_FUNCTION_PASS("si-pre-allocate-wwm-regs", SIPreAllocateWWMRegsPass())
 MACHINE_FUNCTION_PASS("si-shrink-instructions", SIShrinkInstructionsPass())
 #undef MACHINE_FUNCTION_PASS
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 1f2148c2922de9..dc5330740f4a6b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -41,6 +41,7 @@
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
 #include "SIPeepholeSDWA.h"
+#include "SIPreAllocateWWMRegs.h"
 #include "SIShrinkInstructions.h"
 #include "TargetInfo/AMDGPUTargetInfo.h"
 #include "Utils/AMDGPUBaseInfo.h"
@@ -506,7 +507,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSILateBranchLoweringPass(*PR);
   initializeSIMemoryLegalizerPass(*PR);
   initializeSIOptimizeExecMaskingPass(*PR);
-  initializeSIPreAllocateWWMRegsPass(*PR);
+  initializeSIPreAllocateWWMRegsLegacyPass(*PR);
   initializeSIFormMemoryClausesPass(*PR);
   initializeSIPostRABundlerPass(*PR);
   initializeGCNCreateVOPDPass(*PR);
@@ -1505,7 +1506,7 @@ bool GCNPassConfig::addRegAssignAndRewriteFast() {
   addPass(&SILowerSGPRSpillsLegacyID);
 
   // To Allocate wwm registers used in whole quad mode operations (for 
shaders).
-  addPass(&SIPreAllocateWWMRegsID);
+  addPass(&SIPreAllocateWWMRegsLegacyID);
 
   // For allocating other wwm register operands.
   addPass(createWWMRegAllocPass(false));
@@ -1537,7 +1538,7 @@ bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
   addPass(&SILowerSGPRSpillsLegacyID);
 
   // To Allocate wwm registers used in whole quad mode operations (for 
shaders).
-  addPass(&SIPreAllocateWWMRegsID);
+  addPass(&SIPreAllocateWWMRegsLegacyID);
 
   // For allocating other whole wave mode registers.
   addPass(createWWMRegAllocPass(true));
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp 
b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 07303e2aa726c5..f9109c01c8085b 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -11,6 +11,7 @@
 //
 
//===--===//
 
+#include "SIPreAllocateWWMRegs.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -34,7 +35,7 @@ static cl::opt
 
 namespace {
 
-class SIPreAllocateWWMRegs : public MachineFunctionPass {
+class SIPreAllocateWWMRegs {
 private:
   const SIInstrInfo *TII;
   const SIRegisterInfo *TRI;

[llvm-branch-commits] [llvm] [NewPM][CodeGen] Port LiveRegMatrix to NPM (PR #109938)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke edited 
https://github.com/llvm/llvm-project/pull/109938
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Add tests for SIPreAllocateWWMRegs (PR #109963)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109963

>From 58fd5012dabc79c87b2b69a2a4d32d655215f144 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Wed, 25 Sep 2024 11:21:04 +
Subject: [PATCH 1/2] [AMDGPU] Add tests for SIPreAllocateWWMRegs

---
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 26 +++
 .../si-pre-allocate-wwm-sgpr-spills.mir   | 21 +++
 2 files changed, 47 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
new file mode 100644
index 00..f2db299f575f5e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
@@ -0,0 +1,26 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-run-pass=si-pre-allocate-wwm-regs -o - -mcpu=tahiti %s  | FileCheck %s
+
+---
+
+name: pre_allocate_wwm_regs_strict
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK-LABEL: name: pre_allocate_wwm_regs_strict
+; CHECK: liveins: $sgpr1
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK-NEXT: renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def 
$exec, implicit-def $scc, implicit $exec
+; CHECK-NEXT: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+; CHECK-NEXT: dead $vgpr0 = V_MOV_B32_dpp $vgpr0, [[DEF]], 323, 12, 15, 0, 
implicit $exec
+; CHECK-NEXT: $exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def $exec, 
implicit-def $scc, implicit $exec
+%24:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+%25:vgpr_32 = V_MOV_B32_dpp %24:vgpr_32(tied-def 0), %0:vgpr_32, 323, 12, 
15, 0, implicit $exec
+$exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+%2:vgpr_32 = COPY %0:vgpr_32
+...
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir
new file mode 100644
index 00..f0efe74878d831
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir
@@ -0,0 +1,21 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-amdgpu-prealloc-sgpr-spill-vgprs -run-pass=si-pre-allocate-wwm-regs -o - 
-mcpu=tahiti %s | FileCheck %s
+
+---
+
+name: pre_allocate_wwm_spill_to_vgpr
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK-LABEL: name: pre_allocate_wwm_spill_to_vgpr
+; CHECK: liveins: $sgpr1
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK-NEXT: dead $vgpr0 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, [[DEF]]
+; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+%23:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, %0:vgpr_32
+%2:vgpr_32 = COPY %0:vgpr_32
+...
+

>From 8da3a2c25229b43a0710211d9763733e42bf15d3 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 7 Oct 2024 09:13:04 +
Subject: [PATCH 2/2] Keep tests in one file

---
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 23 ---
 .../si-pre-allocate-wwm-sgpr-spills.mir   | 21 -
 2 files changed, 20 insertions(+), 24 deletions(-)
 delete mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
index f2db299f575f5e..4dcad87a985c0b 100644
--- a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
 # RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-run-pass=si-pre-allocate-wwm-regs -o - -mcpu=tahiti %s  | FileCheck %s
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-amdgpu-prealloc-sgpr-spill-vgprs -run-pass=si-pre-allocate-wwm-regs -o - 
-mcpu=tahiti %s | FileCheck %s --check-prefix=CHECK2
 
 ---
 
@@ -19,8 +20,24 @@ body: |
 ; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
 %0:vgpr_32 = IMPLICIT_DEF
 renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def $exec, 
implicit-def $scc, implicit $exec
-%24:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
-%25:vgpr_32 = V_MOV_B32_dpp %24:vgpr_32(tied-def 0), %0:vgpr_32, 323, 12, 
15, 0, implicit $exec
+%1:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+%2:vgpr_32 = V_MOV_B32_dpp %1, %0, 323, 12, 15, 0, implicit $exec
 $exec = EXIT_STRIC

[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke edited 
https://github.com/llvm/llvm-project/pull/111357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke ready_for_review 
https://github.com/llvm/llvm-project/pull/111357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-llvm-regalloc

Author: Akshat Oke (Akshat-Oke)


Changes

Solves the double free error.

---
Full diff: https://github.com/llvm/llvm-project/pull/111357.diff


1 Files Affected:

- (modified) llvm/include/llvm/CodeGen/LiveIntervalUnion.h (+7) 


``diff
diff --git a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h 
b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
index 81003455da4241..cc0f2a45bb182c 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
@@ -176,6 +176,13 @@ class LiveIntervalUnion {
 Array() = default;
 ~Array() { clear(); }
 
+Array(Array &&Other) : Size(Other.Size), LIUs(Other.LIUs) {
+  Other.Size = 0;
+  Other.LIUs = nullptr;
+}
+
+Array(const Array &) = delete;
+
 // Initialize the array to have Size entries.
 // Reuse an existing allocation if the size matches.
 void init(LiveIntervalUnion::Allocator&, unsigned Size);

``




https://github.com/llvm/llvm-project/pull/111357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/111357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NewPM][CodeGen] Port LiveRegMatrix to NPM (PR #109938)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109938

>From 15692bd09ad90b2bedb7383a9acdb2b3b12453c6 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 7 Oct 2024 08:42:24 +
Subject: [PATCH 1/5] [CodeGen] LiveIntervalUnions::Array  Implement move
 constructor

---
 llvm/include/llvm/CodeGen/LiveIntervalUnion.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h 
b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
index 81003455da4241..cc0f2a45bb182c 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
@@ -176,6 +176,13 @@ class LiveIntervalUnion {
 Array() = default;
 ~Array() { clear(); }
 
+Array(Array &&Other) : Size(Other.Size), LIUs(Other.LIUs) {
+  Other.Size = 0;
+  Other.LIUs = nullptr;
+}
+
+Array(const Array &) = delete;
+
 // Initialize the array to have Size entries.
 // Reuse an existing allocation if the size matches.
 void init(LiveIntervalUnion::Allocator&, unsigned Size);

>From 3cf8e938b76984c86ede00cc85b8b2aa9d84b0fe Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 09:07:04 +
Subject: [PATCH 2/5] [NewPM][CodeGen] Port LiveRegMatrix to NPM

---
 llvm/include/llvm/CodeGen/LiveRegMatrix.h | 50 ---
 llvm/include/llvm/InitializePasses.h  |  2 +-
 .../llvm/Passes/MachinePassRegistry.def   |  4 +-
 llvm/lib/CodeGen/LiveRegMatrix.cpp| 38 ++
 llvm/lib/CodeGen/RegAllocBasic.cpp|  8 +--
 llvm/lib/CodeGen/RegAllocGreedy.cpp   |  8 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 llvm/lib/Target/AMDGPU/GCNNSAReassign.cpp |  6 +--
 .../Target/AMDGPU/SIPreAllocateWWMRegs.cpp|  6 +--
 9 files changed, 88 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/LiveRegMatrix.h 
b/llvm/include/llvm/CodeGen/LiveRegMatrix.h
index 2b32308c7c075e..c024ca9c1dc38d 100644
--- a/llvm/include/llvm/CodeGen/LiveRegMatrix.h
+++ b/llvm/include/llvm/CodeGen/LiveRegMatrix.h
@@ -37,7 +37,9 @@ class MachineFunction;
 class TargetRegisterInfo;
 class VirtRegMap;
 
-class LiveRegMatrix : public MachineFunctionPass {
+class LiveRegMatrix {
+  friend class LiveRegMatrixWrapperPass;
+  friend class LiveRegMatrixAnalysis;
   const TargetRegisterInfo *TRI = nullptr;
   LiveIntervals *LIS = nullptr;
   VirtRegMap *VRM = nullptr;
@@ -57,15 +59,21 @@ class LiveRegMatrix : public MachineFunctionPass {
   unsigned RegMaskVirtReg = 0;
   BitVector RegMaskUsable;
 
-  // MachineFunctionPass boilerplate.
-  void getAnalysisUsage(AnalysisUsage &) const override;
-  bool runOnMachineFunction(MachineFunction &) override;
-  void releaseMemory() override;
+  LiveRegMatrix() = default;
+  void releaseMemory();
 
 public:
-  static char ID;
-
-  LiveRegMatrix();
+  LiveRegMatrix(LiveRegMatrix &&Other)
+  : TRI(Other.TRI), LIS(Other.LIS), VRM(Other.VRM), UserTag(Other.UserTag),
+Matrix(std::move(Other.Matrix)), Queries(std::move(Other.Queries)),
+RegMaskTag(Other.RegMaskTag), RegMaskVirtReg(Other.RegMaskVirtReg),
+RegMaskUsable(std::move(Other.RegMaskUsable)) {
+Other.TRI = nullptr;
+Other.LIS = nullptr;
+Other.VRM = nullptr;
+  }
+
+  void init(MachineFunction &MF, LiveIntervals *LIS, VirtRegMap *VRM);
 
   
//======//
   // High-level interface.
@@ -159,6 +167,32 @@ class LiveRegMatrix : public MachineFunctionPass {
   Register getOneVReg(unsigned PhysReg) const;
 };
 
+class LiveRegMatrixWrapperPass : public MachineFunctionPass {
+  LiveRegMatrix LRM;
+
+public:
+  static char ID;
+
+  LiveRegMatrixWrapperPass() : MachineFunctionPass(ID) {}
+
+  LiveRegMatrix &getLRM() { return LRM; }
+  const LiveRegMatrix &getLRM() const { return LRM; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  void releaseMemory() override;
+};
+
+class LiveRegMatrixAnalysis : public AnalysisInfoMixin {
+  friend AnalysisInfoMixin;
+  static AnalysisKey Key;
+
+public:
+  using Result = LiveRegMatrix;
+
+  LiveRegMatrix run(MachineFunction &MF, MachineFunctionAnalysisManager &MFAM);
+};
+
 } // end namespace llvm
 
 #endif // LLVM_CODEGEN_LIVEREGMATRIX_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index d89a5538b46975..3fee8c40a6607e 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -156,7 +156,7 @@ void initializeLiveDebugValuesPass(PassRegistry &);
 void initializeLiveDebugVariablesPass(PassRegistry &);
 void initializeLiveIntervalsWrapperPassPass(PassRegistry &);
 void initializeLiveRangeShrinkPass(PassRegistry &);
-void initializeLiveRegMatrixPass(PassRegistry &);
+void initializeLiveRegMatrixWrapperPassPass(PassRegistry &);
 void initializeLiveStacksPass(PassRegistry &);
 voi

[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

Akshat-Oke wrote:

> [!WARNING]
> This pull request is not mergeable via GitHub because a downstack PR is 
> open. Once all requirements are satisfied, merge this PR as a stack  href="https://app.graphite.dev/github/pr/llvm/llvm-project/111357?utm_source=stack-comment-downstack-mergeability-warning";
>  >on Graphite.
> https://graphite.dev/docs/merge-pull-requests";>Learn more

* **#111357** https://app.graphite.dev/github/pr/llvm/llvm-project/111357?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/> 👈
* **#109937** https://app.graphite.dev/github/pr/llvm/llvm-project/109937?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>: 1 other dependent PR 
([#109938](https://github.com/llvm/llvm-project/pull/109938) https://app.graphite.dev/github/pr/llvm/llvm-project/109938?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>)
* **#109936** https://app.graphite.dev/github/pr/llvm/llvm-project/109936?utm_source=stack-comment-icon";
 target="_blank">https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="10px" height="10px"/>
* `main`

This stack of pull requests is managed by Graphite. https://stacking.dev/?utm_source=stack-comment";>Learn more about 
stacking.


 Join @Akshat-Oke and the rest of your teammates on https://graphite.dev?utm-source=stack-comment";>https://static.graphite.dev/graphite-32x32-black.png"; alt="Graphite" 
width="11px" height="11px"/> Graphite
  

https://github.com/llvm/llvm-project/pull/111357
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke created 
https://github.com/llvm/llvm-project/pull/111357

None

>From 15692bd09ad90b2bedb7383a9acdb2b3b12453c6 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 7 Oct 2024 08:42:24 +
Subject: [PATCH] [CodeGen] LiveIntervalUnions::Array  Implement move
 constructor

---
 llvm/include/llvm/CodeGen/LiveIntervalUnion.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h 
b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
index 81003455da4241..cc0f2a45bb182c 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
@@ -176,6 +176,13 @@ class LiveIntervalUnion {
 Array() = default;
 ~Array() { clear(); }
 
+Array(Array &&Other) : Size(Other.Size), LIUs(Other.LIUs) {
+  Other.Size = 0;
+  Other.LIUs = nullptr;
+}
+
+Array(const Array &) = delete;
+
 // Initialize the array to have Size entries.
 // Reuse an existing allocation if the size matches.
 void init(LiveIntervalUnion::Allocator&, unsigned Size);

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Update correct dependency (PR #109937)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109937

>From 7b68d9fb711d73319d97abb2c03dac31956f1fa5 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 06:35:43 +
Subject: [PATCH] Update correct dependency

---
 llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp 
b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index 4afefa3d9b245c..d8697aa2ffe1cd 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -95,8 +95,8 @@ char SILowerSGPRSpillsLegacy::ID = 0;
 INITIALIZE_PASS_BEGIN(SILowerSGPRSpillsLegacy, DEBUG_TYPE,
   "SI lower SGPR spill instructions", false, false)
 INITIALIZE_PASS_DEPENDENCY(LiveIntervalsWrapperPass)
-INITIALIZE_PASS_DEPENDENCY(VirtRegMapWrapperLegacy)
 INITIALIZE_PASS_DEPENDENCY(MachineDominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(SlotIndexesWrapperPass)
 INITIALIZE_PASS_END(SILowerSGPRSpillsLegacy, DEBUG_TYPE,
 "SI lower SGPR spill instructions", false, false)
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [libcxx] [libc++] Implement std::move_only_function (P0288R9) (PR #94670)

2024-10-07 Thread via llvm-branch-commits

Curve wrote:

Is there something one could help with in this PR? It'd be great to see 
`move_only_function` coming to libc++ in the foreseeable future :)

https://github.com/llvm/llvm-project/pull/94670
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NewPM][CodeGen] Port LiveRegMatrix to NPM (PR #109938)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm approved this pull request.


https://github.com/llvm/llvm-project/pull/109938
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (PR #110705)

2024-10-07 Thread Anatoly Trosinenko via llvm-branch-commits

atrosinenko wrote:

@kovdan01 Thank you for the reproducer!

I restored the original check `TI->readsRegister(AArch64::X16, TRI) ? 
AArch64::X17 : AArch64::X16` when computing the scratch register, as checking 
just the specific operand did not handle the `AUTH_TCRETURN*` pseudos which 
actually have two register operands. Additionally, the register class of the 
second register operand of these pseudo instructions was restricted, as it 
seems to be just a coincidence that at least one scratch register was available 
for authenticated tail calls.

https://github.com/llvm/llvm-project/pull/110705
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (PR #110705)

2024-10-07 Thread Anatoly Trosinenko via llvm-branch-commits

atrosinenko wrote:

Considering the reproducer, the code became a bit longer but correct:


```
_ZN7myshape4moveEv: // @_ZN7myshape4moveEv
.cfi_startproc
// %bb.0:   // %entry
.cfi_b_key_frame
pacibsp
stp x30, x19, [sp, #-16]!   // 16-byte Folded Spill
.cfi_negate_ra_state
.cfi_def_cfa_offset 16
.cfi_offset w19, -8
.cfi_offset w30, -16
mov x19, x0
ldr x0, [x0]
ldr x16, [x0]
mov x17, x0
movkx17, #50297, lsl #48
autda   x16, x17
mov x17, x16
xpacd   x17
cmp x16, x17
b.eq.Lauth_success_0
brk #0xc472
.Lauth_success_0:
ldr x9, [x16]
mov x8, x16
mov x17, x8
movkx17, #36564, lsl #48
blraa   x9, x17
ldr x0, [x19, #8]
ldr x16, [x0]
mov x17, x0
movkx17, #50297, lsl #48
autda   x16, x17
mov x17, x16
xpacd   x17
cmp x16, x17
b.eq.Lauth_success_1
brk #0xc472
.Lauth_success_1:
ldr x2, [x16]
mov x1, x16
ldp x30, x19, [sp], #16 // 16-byte Folded Reload
autibsp
eor x16, x30, x30, lsl #1
tbz x16, #62, .Lauth_success_2
brk #0xc471
.Lauth_success_2:
mov x16, x1
movkx16, #36564, lsl #48
braax2, x16
```


https://github.com/llvm/llvm-project/pull/110705
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [SystemZ] Fix codegen for _[u]128 intrinsics (PR #111376)

2024-10-07 Thread via llvm-branch-commits

https://github.com/llvmbot milestoned 
https://github.com/llvm/llvm-project/pull/111376
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] release/19.x: [SystemZ] Fix codegen for _[u]128 intrinsics (PR #111376)

2024-10-07 Thread via llvm-branch-commits

https://github.com/llvmbot created 
https://github.com/llvm/llvm-project/pull/111376

Backport baf9b7da81025c1e3b0704d7ecf667e06f95642b

Requested by: @uweigand

>From 6cd6d8fb90c95649d8dae0c993be5a3b9dbd494e Mon Sep 17 00:00:00 2001
From: Ulrich Weigand 
Date: Thu, 19 Sep 2024 13:18:43 +0200
Subject: [PATCH] [SystemZ] Fix codegen for _[u]128 intrinsics

PR #74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: https://github.com/llvm/llvm-project/issues/109113
(cherry picked from commit baf9b7da81025c1e3b0704d7ecf667e06f95642b)
---
 clang/lib/Headers/vecintrin.h |  28 ++-
 .../CodeGen/SystemZ/builtins-systemz-i128.c   | 165 ++
 2 files changed, 188 insertions(+), 5 deletions(-)
 create mode 100644 clang/test/CodeGen/SystemZ/builtins-systemz-i128.c

diff --git a/clang/lib/Headers/vecintrin.h b/clang/lib/Headers/vecintrin.h
index 1f51e32c0d136d..609c7cf0b7a6f9 100644
--- a/clang/lib/Headers/vecintrin.h
+++ b/clang/lib/Headers/vecintrin.h
@@ -8359,7 +8359,9 @@ vec_min(__vector double __a, __vector double __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_add_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)((__int128)__a + (__int128)__b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ ((__int128)__a + (__int128)__b);
 }
 
 /*-- vec_addc ---*/
@@ -8389,6 +8391,7 @@ vec_addc(__vector unsigned long long __a, __vector 
unsigned long long __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vaccq((unsigned __int128)__a, (unsigned __int128)__b);
 }
 
@@ -8398,6 +8401,7 @@ static inline __ATTRS_ai __vector unsigned char
 vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
   __vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vacq((unsigned __int128)__a, (unsigned __int128)__b,
  (unsigned __int128)__c);
 }
@@ -8408,6 +8412,7 @@ static inline __ATTRS_ai __vector unsigned char
 vec_addec_u128(__vector unsigned char __a, __vector unsigned char __b,
__vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vacccq((unsigned __int128)__a, (unsigned __int128)__b,
(unsigned __int128)__c);
 }
@@ -8483,7 +8488,9 @@ vec_gfmsum(__vector unsigned int __a, __vector unsigned 
int __b) {
 static inline __ATTRS_o_ai __vector unsigned char
 vec_gfmsum_128(__vector unsigned long long __a,
__vector unsigned long long __b) {
-  return (__vector unsigned char)__builtin_s390_vgfmg(__a, __b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ __builtin_s390_vgfmg(__a, __b);
 }
 
 /*-- vec_gfmsum_accum ---*/
@@ -8513,6 +8520,7 @@ vec_gfmsum_accum_128(__vector unsigned long long __a,
  __vector unsigned long long __b,
  __vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
 }
 
@@ -8810,6 +8818,7 @@ vec_msum_u128(__vector unsigned long long __a, __vector 
unsigned long long __b,
 
 #define vec_msum_u128(X, Y, Z, W) \
   ((__typeof__((vec_msum_u128)((X), (Y), (Z), (W \
+   (unsigned __int128 __attribute__((__vector_size__(16 \
__builtin_s390_vmslg((X), (Y), (unsigned __int128)(Z), (W)))
 #endif
 
@@ -8817,7 +8826,9 @@ vec_msum_u128(__vector unsigned long long __a, __vector 
unsigned long long __b,
 
 static inline __ATTRS_ai __vector unsigned char
 vec_sub_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)((__int128)__a - (__int128)__b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ ((__int128)__a - (__int128)__b);
 }
 
 /*-- vec_subc -

[llvm-branch-commits] [clang] release/19.x: [SystemZ] Fix codegen for _[u]128 intrinsics (PR #111376)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:




@llvm/pr-subscribers-backend-systemz

Author: None (llvmbot)


Changes

Backport baf9b7da81025c1e3b0704d7ecf667e06f95642b

Requested by: @uweigand

---
Full diff: https://github.com/llvm/llvm-project/pull/111376.diff


2 Files Affected:

- (modified) clang/lib/Headers/vecintrin.h (+23-5) 
- (added) clang/test/CodeGen/SystemZ/builtins-systemz-i128.c (+165) 


``diff
diff --git a/clang/lib/Headers/vecintrin.h b/clang/lib/Headers/vecintrin.h
index 1f51e32c0d136d..609c7cf0b7a6f9 100644
--- a/clang/lib/Headers/vecintrin.h
+++ b/clang/lib/Headers/vecintrin.h
@@ -8359,7 +8359,9 @@ vec_min(__vector double __a, __vector double __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_add_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)((__int128)__a + (__int128)__b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ ((__int128)__a + (__int128)__b);
 }
 
 /*-- vec_addc ---*/
@@ -8389,6 +8391,7 @@ vec_addc(__vector unsigned long long __a, __vector 
unsigned long long __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vaccq((unsigned __int128)__a, (unsigned __int128)__b);
 }
 
@@ -8398,6 +8401,7 @@ static inline __ATTRS_ai __vector unsigned char
 vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
   __vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vacq((unsigned __int128)__a, (unsigned __int128)__b,
  (unsigned __int128)__c);
 }
@@ -8408,6 +8412,7 @@ static inline __ATTRS_ai __vector unsigned char
 vec_addec_u128(__vector unsigned char __a, __vector unsigned char __b,
__vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vacccq((unsigned __int128)__a, (unsigned __int128)__b,
(unsigned __int128)__c);
 }
@@ -8483,7 +8488,9 @@ vec_gfmsum(__vector unsigned int __a, __vector unsigned 
int __b) {
 static inline __ATTRS_o_ai __vector unsigned char
 vec_gfmsum_128(__vector unsigned long long __a,
__vector unsigned long long __b) {
-  return (__vector unsigned char)__builtin_s390_vgfmg(__a, __b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ __builtin_s390_vgfmg(__a, __b);
 }
 
 /*-- vec_gfmsum_accum ---*/
@@ -8513,6 +8520,7 @@ vec_gfmsum_accum_128(__vector unsigned long long __a,
  __vector unsigned long long __b,
  __vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
 }
 
@@ -8810,6 +8818,7 @@ vec_msum_u128(__vector unsigned long long __a, __vector 
unsigned long long __b,
 
 #define vec_msum_u128(X, Y, Z, W) \
   ((__typeof__((vec_msum_u128)((X), (Y), (Z), (W \
+   (unsigned __int128 __attribute__((__vector_size__(16 \
__builtin_s390_vmslg((X), (Y), (unsigned __int128)(Z), (W)))
 #endif
 
@@ -8817,7 +8826,9 @@ vec_msum_u128(__vector unsigned long long __a, __vector 
unsigned long long __b,
 
 static inline __ATTRS_ai __vector unsigned char
 vec_sub_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)((__int128)__a - (__int128)__b);
+  return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
+ ((__int128)__a - (__int128)__b);
 }
 
 /*-- vec_subc ---*/
@@ -8847,6 +8858,7 @@ vec_subc(__vector unsigned long long __a, __vector 
unsigned long long __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vscbiq((unsigned __int128)__a, (unsigned __int128)__b);
 }
 
@@ -8856,6 +8868,7 @@ static inline __ATTRS_ai __vector unsigned char
 vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b,
   __vector unsigned char __c) {
   return (__vector unsigned char)
+ (unsigned __int128 __attribute__((__vector_size__(16
  __builtin_s390_vsbiq((unsigned __int128)__a, (unsigned __int128)__b,
   (unsigned __int128)__c);
 }
@@ -8866,6 +8879,7 @@ static inline __ATTRS_ai __vector unsigned

[llvm-branch-commits] [llvm] 66c986a - Revert "[SPIRV] Add radians intrinsic (#110800)"

2024-10-07 Thread via llvm-branch-commits

Author: Justin Bogner
Date: 2024-10-07T09:21:36-07:00
New Revision: 66c986aac6a28c130eff261228995989a044f6bc

URL: 
https://github.com/llvm/llvm-project/commit/66c986aac6a28c130eff261228995989a044f6bc
DIFF: 
https://github.com/llvm/llvm-project/commit/66c986aac6a28c130eff261228995989a044f6bc.diff

LOG: Revert "[SPIRV] Add radians intrinsic (#110800)"

This reverts commit c0f8889774ce4926ed58e2bf379d8ba70adf79ae.

Added: 


Modified: 
llvm/include/llvm/IR/IntrinsicsSPIRV.td
llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp

Removed: 
llvm/test/CodeGen/SPIRV/hlsl-intrinsics/radians.ll
llvm/test/CodeGen/SPIRV/opencl/radians.ll



diff  --git a/llvm/include/llvm/IR/IntrinsicsSPIRV.td 
b/llvm/include/llvm/IR/IntrinsicsSPIRV.td
index 88059aa8378140..0567efd8a5d7af 100644
--- a/llvm/include/llvm/IR/IntrinsicsSPIRV.td
+++ b/llvm/include/llvm/IR/IntrinsicsSPIRV.td
@@ -84,5 +84,4 @@ let TargetPrefix = "spv" in {
 [IntrNoMem, Commutative] >;
   def int_spv_wave_is_first_lane : DefaultAttrsIntrinsic<[llvm_i1_ty], [], 
[IntrConvergent]>;
   def int_spv_sign : DefaultAttrsIntrinsic<[LLVMScalarOrSameVectorWidth<0, 
llvm_i32_ty>], [llvm_any_ty], [IntrNoMem]>;
-  def int_spv_radians : DefaultAttrsIntrinsic<[LLVMMatchType<0>], 
[llvm_anyfloat_ty], [IntrNoMem]>;
 }

diff  --git a/llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp 
b/llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
index 468e34a365826a..3917ad180b87fc 100644
--- a/llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
+++ b/llvm/lib/Target/SPIRV/SPIRVInstructionSelector.cpp
@@ -2537,8 +2537,6 @@ bool SPIRVInstructionSelector::selectIntrinsic(Register 
ResVReg,
   }
   case Intrinsic::spv_step:
 return selectExtInst(ResVReg, ResType, I, CL::step, GL::Step);
-  case Intrinsic::spv_radians:
-return selectExtInst(ResVReg, ResType, I, CL::radians, GL::Radians);
   // Discard intrinsics which we do not expect to actually represent code after
   // lowering or intrinsics which are not implemented but should not crash when
   // found in a customer's LLVM IR input.

diff  --git a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/radians.ll 
b/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/radians.ll
deleted file mode 100644
index 1fe8ab30ed9538..00
--- a/llvm/test/CodeGen/SPIRV/hlsl-intrinsics/radians.ll
+++ /dev/null
@@ -1,48 +0,0 @@
-; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv-unknown-unknown %s -o - | 
FileCheck %s
-; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv-unknown-unknown %s -o - 
-filetype=obj | spirv-val %}
-
-; CHECK-DAG: %[[#op_ext_glsl:]] = OpExtInstImport "GLSL.std.450"
-
-; CHECK-DAG: %[[#float_32:]] = OpTypeFloat 32
-; CHECK-DAG: %[[#float_16:]] = OpTypeFloat 16
-
-; CHECK-DAG: %[[#vec4_float_32:]] = OpTypeVector %[[#float_32]] 4
-; CHECK-DAG: %[[#vec4_float_16:]] = OpTypeVector %[[#float_16]] 4
-
-declare half @llvm.spv.radians.f16(half)
-declare float @llvm.spv.radians.f32(float)
-
-declare <4 x float> @llvm.spv.radians.v4f32(<4 x float>)
-declare <4 x half> @llvm.spv.radians.v4f16(<4 x half>)
-
-define noundef float @radians_float(float noundef %a) {
-entry:
-; CHECK: %[[#float_32_arg:]] = OpFunctionParameter %[[#float_32]]
-; CHECK: %[[#]] = OpExtInst %[[#float_32]] %[[#op_ext_glsl]] Radians 
%[[#float_32_arg]]
-  %elt.radians = call float @llvm.spv.radians.f32(float %a)
-  ret float %elt.radians
-}
-
-define noundef half @radians_half(half noundef %a) {
-entry:
-; CHECK: %[[#float_16_arg:]] = OpFunctionParameter %[[#float_16]]
-; CHECK: %[[#]] = OpExtInst %[[#float_16]] %[[#op_ext_glsl]] Radians 
%[[#float_16_arg]]
-  %elt.radians = call half @llvm.spv.radians.f16(half %a)
-  ret half %elt.radians
-}
-
-define noundef <4 x float> @radians_float_vector(<4 x float> noundef %a) {
-entry:
-; CHECK: %[[#vec4_float_32_arg:]] = OpFunctionParameter %[[#vec4_float_32]]
-; CHECK: %[[#]] = OpExtInst %[[#vec4_float_32]] %[[#op_ext_glsl]] Radians 
%[[#vec4_float_32_arg]]
-  %elt.radians = call <4 x float> @llvm.spv.radians.v4f32(<4 x float> %a)
-  ret <4 x float> %elt.radians
-}
-
-define noundef <4 x half> @radians_half_vector(<4 x half> noundef %a) {
-entry:
-; CHECK: %[[#vec4_float_16_arg:]] = OpFunctionParameter %[[#vec4_float_16]]
-; CHECK: %[[#]] = OpExtInst %[[#vec4_float_16]] %[[#op_ext_glsl]] Radians 
%[[#vec4_float_16_arg]]
-  %elt.radians = call <4 x half> @llvm.spv.radians.v4f16(<4 x half> %a)
-  ret <4 x half> %elt.radians
-}

diff  --git a/llvm/test/CodeGen/SPIRV/opencl/radians.ll 
b/llvm/test/CodeGen/SPIRV/opencl/radians.ll
deleted file mode 100644
index f7bb8d5226cd19..00
--- a/llvm/test/CodeGen/SPIRV/opencl/radians.ll
+++ /dev/null
@@ -1,51 +0,0 @@
-; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o - 
| FileCheck %s
-; RUN: llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o - 
| FileCheck %s
-; RUN: %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o - 
-fil

[llvm-branch-commits] [llvm] 26ed065 - Revert "[DXIL] Add radians intrinsic (#110616)"

2024-10-07 Thread via llvm-branch-commits

Author: Justin Bogner
Date: 2024-10-07T09:26:23-07:00
New Revision: 26ed065f2de8dfd92b06f9dd559c00c8c77c44da

URL: 
https://github.com/llvm/llvm-project/commit/26ed065f2de8dfd92b06f9dd559c00c8c77c44da
DIFF: 
https://github.com/llvm/llvm-project/commit/26ed065f2de8dfd92b06f9dd559c00c8c77c44da.diff

LOG: Revert "[DXIL] Add radians intrinsic (#110616)"

This reverts commit 1e75d08659aeb1aabf92b59f33649c414d4ff8b7.

Added: 


Modified: 
llvm/include/llvm/IR/IntrinsicsDirectX.td
llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp

Removed: 
llvm/test/CodeGen/DirectX/radians.ll



diff  --git a/llvm/include/llvm/IR/IntrinsicsDirectX.td 
b/llvm/include/llvm/IR/IntrinsicsDirectX.td
index f2b9e286ebb476..1adfcdc9a1ed2f 100644
--- a/llvm/include/llvm/IR/IntrinsicsDirectX.td
+++ b/llvm/include/llvm/IR/IntrinsicsDirectX.td
@@ -86,5 +86,4 @@ def int_dx_rsqrt  : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], 
[LLVMMatchType<0>]
 def int_dx_wave_is_first_lane : DefaultAttrsIntrinsic<[llvm_i1_ty], [], 
[IntrConvergent]>;
 def int_dx_sign : DefaultAttrsIntrinsic<[LLVMScalarOrSameVectorWidth<0, 
llvm_i32_ty>], [llvm_any_ty], [IntrNoMem]>;
 def int_dx_step : DefaultAttrsIntrinsic<[LLVMMatchType<0>], [llvm_anyfloat_ty, 
LLVMMatchType<0>], [IntrNoMem]>;
-def int_dx_radians : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], 
[LLVMMatchType<0>], [IntrNoMem]>;
 }

diff  --git a/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp 
b/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
index 0bcd03c7fad38d..39e73a11d9d7ee 100644
--- a/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
+++ b/llvm/lib/Target/DirectX/DXILIntrinsicExpansion.cpp
@@ -64,7 +64,6 @@ static bool isIntrinsicExpansion(Function &F) {
   case Intrinsic::dx_udot:
   case Intrinsic::dx_sign:
   case Intrinsic::dx_step:
-  case Intrinsic::dx_radians:
 return true;
   }
   return false;
@@ -443,14 +442,6 @@ static Value *expandStepIntrinsic(CallInst *Orig) {
   return Builder.CreateSelect(Cond, Zero, One);
 }
 
-static Value *expandRadiansIntrinsic(CallInst *Orig) {
-  Value *X = Orig->getOperand(0);
-  Type *Ty = X->getType();
-  IRBuilder<> Builder(Orig);
-  Value *PiOver180 = ConstantFP::get(Ty, llvm::numbers::pi / 180.0);
-  return Builder.CreateFMul(X, PiOver180);
-}
-
 static Intrinsic::ID getMaxForClamp(Type *ElemTy,
 Intrinsic::ID ClampIntrinsic) {
   if (ClampIntrinsic == Intrinsic::dx_uclamp)
@@ -570,9 +561,6 @@ static bool expandIntrinsic(Function &F, CallInst *Orig) {
 break;
   case Intrinsic::dx_step:
 Result = expandStepIntrinsic(Orig);
-  case Intrinsic::dx_radians:
-Result = expandRadiansIntrinsic(Orig);
-break;
   }
   if (Result) {
 Orig->replaceAllUsesWith(Result);

diff  --git a/llvm/test/CodeGen/DirectX/radians.ll 
b/llvm/test/CodeGen/DirectX/radians.ll
deleted file mode 100644
index 73ec013775c3e9..00
--- a/llvm/test/CodeGen/DirectX/radians.ll
+++ /dev/null
@@ -1,79 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py 
UTC_ARGS: --version 5
-; RUN: opt -S -dxil-intrinsic-expansion -scalarizer 
-mtriple=dxil-pc-shadermodel6.3-library %s | FileCheck %s
-
-declare half @llvm.dx.radians.f16(half)
-declare float @llvm.dx.radians.f32(float)
-
-declare <4 x half> @llvm.dx.radians.v4f16(<4 x half>)
-declare <4 x float> @llvm.dx.radians.v4f32(<4 x float>)
-
-define noundef half @radians_half(half noundef %a) {
-; CHECK-LABEL: define noundef half @radians_half(
-; CHECK-SAME: half noundef [[A:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:[[TMP0:%.*]] = fmul half [[A]], 0xH2478
-; CHECK-NEXT:ret half [[TMP0]]
-;
-entry:
-  %elt.radians = call half @llvm.dx.radians.f16(half %a)
-  ret half %elt.radians
-}
-
-define noundef float @radians_float(float noundef %a) {
-; CHECK-LABEL: define noundef float @radians_float(
-; CHECK-SAME: float noundef [[A:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:[[TMP0:%.*]] = fmul float [[A]], 0x3F91DF46A000
-; CHECK-NEXT:ret float [[TMP0]]
-;
-entry:
-  %elt.radians = call float @llvm.dx.radians.f32(float %a)
-  ret float %elt.radians
-}
-
-define noundef <4 x half> @radians_half_vector(<4 x half> noundef %a) {
-; CHECK-LABEL: define noundef <4 x half> @radians_half_vector(
-; CHECK-SAME: <4 x half> noundef [[A:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK: [[ee0:%.*]] = extractelement <4 x half> [[A]], i64 0
-; CHECK: [[ie0:%.*]] = fmul half [[ee0]], 0xH2478
-; CHECK: [[ee1:%.*]] = extractelement <4 x half> [[A]], i64 1
-; CHECK: [[ie1:%.*]] = fmul half [[ee1]], 0xH2478
-; CHECK: [[ee2:%.*]] = extractelement <4 x half> [[A]], i64 2
-; CHECK: [[ie2:%.*]] = fmul half [[ee2]], 0xH2478
-; CHECK: [[ee3:%.*]] = extractelement <4 x half> [[A]], i64 3
-; CHECK: [[ie3:%.*]] = fmul half [[ee3]], 0xH2478
-; CHECK: [[TMP0:%.*]] = insertelement <4 x half> poison, half [[ie0]], i64 0
-; CHECK: [[TMP1:%.*]] = inserteleme

[llvm-branch-commits] [llvm] [AArch64][PAC] Move emission of LR checks in tail calls to AsmPrinter (PR #110705)

2024-10-07 Thread Anatoly Trosinenko via llvm-branch-commits

https://github.com/atrosinenko updated 
https://github.com/llvm/llvm-project/pull/110705

>From 089cc13bbd2cac76a2d3fc0b2f72b0bccda5b188 Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko 
Date: Mon, 23 Sep 2024 19:51:55 +0300
Subject: [PATCH 1/2] [AArch64][PAC] Move emission of LR checks in tail calls
 to AsmPrinter

Move the emission of the checks performed on the authenticated LR value
during tail calls to AArch64AsmPrinter class, so that different checker
sequences can be reused by pseudo instructions expanded there.
This adds one more option to AuthCheckMethod enumeration, the generic
XPAC variant which is not restricted to checking the LR register.
---
 llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp | 143 +++---
 llvm/lib/Target/AArch64/AArch64InstrInfo.cpp  |  13 ++
 llvm/lib/Target/AArch64/AArch64InstrInfo.td   |   2 +
 .../lib/Target/AArch64/AArch64PointerAuth.cpp | 182 +-
 llvm/lib/Target/AArch64/AArch64PointerAuth.h  |  40 ++--
 llvm/lib/Target/AArch64/AArch64Subtarget.cpp  |   2 -
 llvm/lib/Target/AArch64/AArch64Subtarget.h|  23 ---
 llvm/test/CodeGen/AArch64/ptrauth-ret-trap.ll |  36 ++--
 .../AArch64/sign-return-address-tailcall.ll   |  54 +++---
 9 files changed, 192 insertions(+), 303 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp 
b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
index 6d2dd0ecbccf31..50502477706ccf 100644
--- a/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
+++ b/llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
@@ -153,6 +153,7 @@ class AArch64AsmPrinter : public AsmPrinter {
   void emitPtrauthCheckAuthenticatedValue(Register TestedReg,
   Register ScratchReg,
   AArch64PACKey::ID Key,
+  AArch64PAuth::AuthCheckMethod Method,
   bool ShouldTrap,
   const MCSymbol *OnFailure);
 
@@ -1731,7 +1732,8 @@ unsigned 
AArch64AsmPrinter::emitPtrauthDiscriminator(uint16_t Disc,
 /// of proceeding to the next instruction (only if ShouldTrap is false).
 void AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue(
 Register TestedReg, Register ScratchReg, AArch64PACKey::ID Key,
-bool ShouldTrap, const MCSymbol *OnFailure) {
+AArch64PAuth::AuthCheckMethod Method, bool ShouldTrap,
+const MCSymbol *OnFailure) {
   // Insert a sequence to check if authentication of TestedReg succeeded,
   // such as:
   //
@@ -1757,38 +1759,70 @@ void 
AArch64AsmPrinter::emitPtrauthCheckAuthenticatedValue(
   //Lsuccess:
   //  ...
   //
-  // This sequence is expensive, but we need more information to be able to
-  // do better.
-  //
-  // We can't TBZ the poison bit because EnhancedPAC2 XORs the PAC bits
-  // on failure.
-  // We can't TST the PAC bits because we don't always know how the address
-  // space is setup for the target environment (and the bottom PAC bit is
-  // based on that).
-  // Either way, we also don't always know whether TBI is enabled or not for
-  // the specific target environment.
+  // See the documentation on AuthCheckMethod enumeration constants for
+  // the specific code sequences that can be used to perform the check.
+  using AArch64PAuth::AuthCheckMethod;
 
-  unsigned XPACOpc = getXPACOpcodeForKey(Key);
+  if (Method == AuthCheckMethod::None)
+return;
+  if (Method == AuthCheckMethod::DummyLoad) {
+EmitToStreamer(MCInstBuilder(AArch64::LDRWui)
+   .addReg(getWRegFromXReg(ScratchReg))
+   .addReg(TestedReg)
+   .addImm(0));
+assert(ShouldTrap && !OnFailure && "DummyLoad always traps on error");
+return;
+  }
 
   MCSymbol *SuccessSym = createTempSymbol("auth_success_");
+  if (Method == AuthCheckMethod::XPAC || Method == AuthCheckMethod::XPACHint) {
+//  mov Xscratch, Xtested
+emitMovXReg(ScratchReg, TestedReg);
 
-  //  mov Xscratch, Xtested
-  emitMovXReg(ScratchReg, TestedReg);
-
-  //  xpac(i|d) Xscratch
-  EmitToStreamer(MCInstBuilder(XPACOpc).addReg(ScratchReg).addReg(ScratchReg));
+if (Method == AuthCheckMethod::XPAC) {
+  //  xpac(i|d) Xscratch
+  unsigned XPACOpc = getXPACOpcodeForKey(Key);
+  EmitToStreamer(
+  MCInstBuilder(XPACOpc).addReg(ScratchReg).addReg(ScratchReg));
+} else {
+  //  xpaclri
+
+  // Note that this method applies XPAC to TestedReg instead of ScratchReg.
+  assert(TestedReg == AArch64::LR &&
+ "XPACHint mode is only compatible with checking the LR register");
+  assert((Key == AArch64PACKey::IA || Key == AArch64PACKey::IB) &&
+ "XPACHint mode is only compatible with I-keys");
+  EmitToStreamer(MCInstBuilder(AArch64::XPACLRI));
+}
 
-  //  cmp Xtested, Xscratch
-  EmitToStreamer(MCInstBuilder(AArch64::SUBSXrs)
- .addReg(AArch64::XZR)
- .addReg(TestedReg)
- .

[llvm-branch-commits] [clang] [llvm] Thin11 (PR #111464)

2024-10-07 Thread Kyungwoo Lee via llvm-branch-commits

https://github.com/kyulee-com created 
https://github.com/llvm/llvm-project/pull/111464

None

>From 1249f0411388fb0832b49e80e7b6a0985822b026 Mon Sep 17 00:00:00 2001
From: Kyungwoo Lee 
Date: Fri, 13 Sep 2024 08:51:00 -0700
Subject: [PATCH 1/4] [CGData][ThinLTO] Global Outlining with Two-CodeGen
 Rounds

---
 llvm/include/llvm/CGData/CodeGenData.h|  16 +++
 llvm/lib/CGData/CodeGenData.cpp   |  81 +-
 llvm/lib/LTO/CMakeLists.txt   |   1 +
 llvm/lib/LTO/LTO.cpp  | 103 +-
 llvm/lib/LTO/LTOBackend.cpp   |  11 ++
 .../test/ThinLTO/AArch64/cgdata-two-rounds.ll |  94 
 llvm/test/ThinLTO/AArch64/lit.local.cfg   |   2 +
 7 files changed, 302 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/ThinLTO/AArch64/cgdata-two-rounds.ll
 create mode 100644 llvm/test/ThinLTO/AArch64/lit.local.cfg

diff --git a/llvm/include/llvm/CGData/CodeGenData.h 
b/llvm/include/llvm/CGData/CodeGenData.h
index 84133a433170fe..1e1afe99327650 100644
--- a/llvm/include/llvm/CGData/CodeGenData.h
+++ b/llvm/include/llvm/CGData/CodeGenData.h
@@ -164,6 +164,22 @@ publishOutlinedHashTree(std::unique_ptr 
HashTree) {
   CodeGenData::getInstance().publishOutlinedHashTree(std::move(HashTree));
 }
 
+/// Initialize the two-codegen rounds.
+void initializeTwoCodegenRounds();
+
+/// Save the current module before the first codegen round.
+void saveModuleForTwoRounds(const Module &TheModule, unsigned Task);
+
+/// Load the current module before the second codegen round.
+std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule,
+   unsigned Task,
+   LLVMContext &Context);
+
+/// Merge the codegen data from the input files in scratch vector in ThinLTO
+/// two-codegen rounds.
+Error mergeCodeGenData(
+const std::unique_ptr>> InputFiles);
+
 void warn(Error E, StringRef Whence = "");
 void warn(Twine Message, std::string Whence = "", std::string Hint = "");
 
diff --git a/llvm/lib/CGData/CodeGenData.cpp b/llvm/lib/CGData/CodeGenData.cpp
index 55d2504231c744..ff8e5dd7c75790 100644
--- a/llvm/lib/CGData/CodeGenData.cpp
+++ b/llvm/lib/CGData/CodeGenData.cpp
@@ -17,6 +17,7 @@
 #include "llvm/Object/ObjectFile.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FileSystem.h"
+#include "llvm/Support/Path.h"
 #include "llvm/Support/WithColor.h"
 
 #define DEBUG_TYPE "cg-data"
@@ -30,6 +31,14 @@ cl::opt
 cl::opt
 CodeGenDataUsePath("codegen-data-use-path", cl::init(""), cl::Hidden,
cl::desc("File path to where .cgdata file is read"));
+cl::opt CodeGenDataThinLTOTwoRounds(
+"codegen-data-thinlto-two-rounds", cl::init(false), cl::Hidden,
+cl::desc("Enable two-round ThinLTO code generation. The first round "
+ "emits codegen data, while the second round uses the emitted "
+ "codegen data for further optimizations."));
+
+// Path to where the optimized bitcodes are saved and restored for ThinLTO.
+static SmallString<128> CodeGenDataThinLTOTwoRoundsPath;
 
 static std::string getCGDataErrString(cgdata_error Err,
   const std::string &ErrMsg = "") {
@@ -139,7 +148,7 @@ CodeGenData &CodeGenData::getInstance() {
   std::call_once(CodeGenData::OnceFlag, []() {
 Instance = std::unique_ptr(new CodeGenData());
 
-if (CodeGenDataGenerate)
+if (CodeGenDataGenerate || CodeGenDataThinLTOTwoRounds)
   Instance->EmitCGData = true;
 else if (!CodeGenDataUsePath.empty()) {
   // Initialize the global CGData if the input file name is given.
@@ -215,6 +224,76 @@ void warn(Error E, StringRef Whence) {
   }
 }
 
+static std::string getPath(StringRef Dir, unsigned Task) {
+  return (Dir + "/" + llvm::Twine(Task) + ".saved_copy.bc").str();
+}
+
+void initializeTwoCodegenRounds() {
+  assert(CodeGenDataThinLTOTwoRounds);
+  if (auto EC = llvm::sys::fs::createUniqueDirectory(
+  "cgdata", CodeGenDataThinLTOTwoRoundsPath))
+report_fatal_error(Twine("Failed to create directory: ") + EC.message());
+}
+
+void saveModuleForTwoRounds(const Module &TheModule, unsigned Task) {
+  assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath));
+  std::string Path = getPath(CodeGenDataThinLTOTwoRoundsPath, Task);
+  std::error_code EC;
+  raw_fd_ostream OS(Path, EC, sys::fs::OpenFlags::OF_None);
+  if (EC)
+report_fatal_error(Twine("Failed to open ") + Path +
+   " to save optimized bitcode: " + EC.message());
+  WriteBitcodeToFile(TheModule, OS, /* ShouldPreserveUseListOrder */ true);
+}
+
+std::unique_ptr loadModuleForTwoRounds(BitcodeModule &OrigModule,
+   unsigned Task,
+   LLVMContext &Context) {
+  assert(sys::fs::is_directory(CodeGenDataThinLTOTwoRoundsPath));
+  std::string Path = getPath(CodeGenDataThinLTO

[llvm-branch-commits] [clang] [Serialization] Code cleanups and polish 83233 (PR #83237)

2024-10-07 Thread Chuanqi Xu via llvm-branch-commits

ChuanqiXu9 wrote:

@ilya-biryukov gentle ping~

https://github.com/llvm/llvm-project/pull/83237
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread Bill Wendling via llvm-branch-commits

https://github.com/bwendling updated 
https://github.com/llvm/llvm-project/pull/111445

>From f414a24fd7ad173793f920157bfa95985a09a946 Mon Sep 17 00:00:00 2001
From: Jan Hendrik Farr 
Date: Thu, 3 Oct 2024 07:16:21 +0200
Subject: [PATCH 1/2] [Clang] Fix 'counted_by' for nested struct pointers
 (#110497)

Fix counted_by attribute for cases where the flexible array member is
accessed through struct pointer inside another struct:

```
struct variable {
int a;
int b;
int length;
short array[] __attribute__((counted_by(length)));
};

struct bucket {
int a;
struct variable *growable;
int b;
};
```

__builtin_dynamic_object_size(p->growable->array, 0);

This commit makes sure that if the StructBase is both a MemberExpr and a
pointer, it is treated as a pointer. Otherwise clang will generate to
code to access the address of p->growable intead of loading the value of
p->growable->length.

Fixes #110385
---
 clang/lib/CodeGen/CGExpr.cpp  | 16 ++--
 clang/test/CodeGen/attr-counted-by-pr110385.c | 70 
 clang/test/CodeGen/attr-counted-by.c  | 80 +--
 3 files changed, 117 insertions(+), 49 deletions(-)
 create mode 100644 clang/test/CodeGen/attr-counted-by-pr110385.c

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 5f58a64d8386c3..8d6f535bba896e 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1052,6 +1052,8 @@ class StructAccessBase
 return Visit(E->getBase());
   }
   const Expr *VisitCastExpr(const CastExpr *E) {
+if (E->getCastKind() == CK_LValueToRValue)
+  return E;
 return Visit(E->getSubExpr());
   }
   const Expr *VisitParenExpr(const ParenExpr *E) {
@@ -1119,19 +1121,15 @@ llvm::Value *CodeGenFunction::EmitCountedByFieldExpr(
 return nullptr;
 
   llvm::Value *Res = nullptr;
-  if (const auto *DRE = dyn_cast(StructBase)) {
-Res = EmitDeclRefLValue(DRE).getPointer(*this);
-Res = Builder.CreateAlignedLoad(ConvertType(DRE->getType()), Res,
-getPointerAlign(), "dre.load");
-  } else if (const MemberExpr *ME = dyn_cast(StructBase)) {
-LValue LV = EmitMemberExpr(ME);
-Address Addr = LV.getAddress();
-Res = Addr.emitRawPointer(*this);
-  } else if (StructBase->getType()->isPointerType()) {
+  if (StructBase->getType()->isPointerType()) {
 LValueBaseInfo BaseInfo;
 TBAAAccessInfo TBAAInfo;
 Address Addr = EmitPointerWithAlignment(StructBase, &BaseInfo, &TBAAInfo);
 Res = Addr.emitRawPointer(*this);
+  } else if (StructBase->isLValue()) {
+LValue LV = EmitLValue(StructBase);
+Address Addr = LV.getAddress();
+Res = Addr.emitRawPointer(*this);
   } else {
 return nullptr;
   }
diff --git a/clang/test/CodeGen/attr-counted-by-pr110385.c 
b/clang/test/CodeGen/attr-counted-by-pr110385.c
new file mode 100644
index 00..6891d5abe7d5c2
--- /dev/null
+++ b/clang/test/CodeGen/attr-counted-by-pr110385.c
@@ -0,0 +1,70 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 4
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O2 
-Wno-missing-declarations -emit-llvm -o - %s | FileCheck %s
+
+// See #110385
+// Based on reproducer from Kees Cook:
+// https://lore.kernel.org/all/202409170436.C3C6E7F7A@keescook/
+
+struct variable {
+int a;
+int b;
+int length;
+short array[] __attribute__((counted_by(length)));
+};
+
+struct bucket {
+int a;
+struct variable *growable;
+int b;
+};
+
+struct bucket2 {
+int a;
+struct variable growable;
+};
+
+void init(void * __attribute__((pass_dynamic_object_size(0;
+
+// CHECK-LABEL: define dso_local void @test1(
+// CHECK-SAME: ptr nocapture noundef readonly [[FOO:%.*]]) local_unnamed_addr 
#[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[GROWABLE:%.*]] = getelementptr inbounds i8, ptr [[FOO]], 
i64 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GROWABLE]], align 8, !tbaa 
[[TBAA2:![0-9]+]]
+// CHECK-NEXT:[[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 
12
+// CHECK-NEXT:[[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr 
[[TMP0]], i64 8
+// CHECK-NEXT:[[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr 
[[DOT_COUNTED_BY_GEP]], align 4
+// CHECK-NEXT:[[TMP1:%.*]] = sext i32 [[DOT_COUNTED_BY_LOAD]] to i64
+// CHECK-NEXT:[[TMP2:%.*]] = shl nsw i64 [[TMP1]], 1
+// CHECK-NEXT:[[TMP3:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD]], -1
+// CHECK-NEXT:[[TMP4:%.*]] = select i1 [[TMP3]], i64 [[TMP2]], i64 0
+// CHECK-NEXT:tail call void @init(ptr noundef nonnull [[ARRAY]], i64 
noundef [[TMP4]]) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:ret void
+//
+void test1(struct bucket *foo) {
+init(foo->growable->array);
+}
+
+// CHECK-LABEL: define dso_local void @test2(
+// CHECK-SAME: ptr noundef [[FOO:%.*]]) local_unnamed_addr #[[ATTR

[llvm-branch-commits] [llvm] Local: Handle noalias.addrspace in copyMetadataForLoad (PR #103939)

2024-10-07 Thread Matt Arsenault via llvm-branch-commits

https://github.com/arsenm updated 
https://github.com/llvm/llvm-project/pull/103939

>From 7dcad206090561d7a46f853df8e67ad02bda8a72 Mon Sep 17 00:00:00 2001
From: Matt Arsenault 
Date: Wed, 14 Aug 2024 16:51:08 +0400
Subject: [PATCH] Local: Handle noalias.addrspace in copyMetadataForLoad

---
 llvm/lib/Transforms/Utils/Local.cpp| 1 +
 llvm/test/Transforms/InstCombine/loadstore-metadata.ll | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Utils/Local.cpp 
b/llvm/lib/Transforms/Utils/Local.cpp
index c5915e5fdfbec0..93c0bbb94eaf1c 100644
--- a/llvm/lib/Transforms/Utils/Local.cpp
+++ b/llvm/lib/Transforms/Utils/Local.cpp
@@ -3473,6 +3473,7 @@ void llvm::copyMetadataForLoad(LoadInst &Dest, const 
LoadInst &Source) {
 case LLVMContext::MD_mem_parallel_loop_access:
 case LLVMContext::MD_access_group:
 case LLVMContext::MD_noundef:
+case LLVMContext::MD_noalias_addrspace:
   // All of these directly apply.
   Dest.setMetadata(ID, N);
   break;
diff --git a/llvm/test/Transforms/InstCombine/loadstore-metadata.ll 
b/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
index 247a02f0bcc14a..dccbfbd13f73d0 100644
--- a/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
+++ b/llvm/test/Transforms/InstCombine/loadstore-metadata.ll
@@ -177,7 +177,7 @@ define i32 @test_load_cast_combine_noalias_addrspace(ptr 
%ptr) {
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves TBAA.
 ; CHECK-LABEL: @test_load_cast_combine_noalias_addrspace(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:[[L1:%.*]] = load i32, ptr [[PTR:%.*]], align 4
+; CHECK-NEXT:[[L1:%.*]] = load i32, ptr [[PTR:%.*]], align 4, 
!noalias.addrspace [[META10:![0-9]+]]
 ; CHECK-NEXT:ret i32 [[L1]]
 ;
 entry:

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread Bill Wendling via llvm-branch-commits

https://github.com/bwendling created 
https://github.com/llvm/llvm-project/pull/111445

/cherry-pick

>From f414a24fd7ad173793f920157bfa95985a09a946 Mon Sep 17 00:00:00 2001
From: Jan Hendrik Farr 
Date: Thu, 3 Oct 2024 07:16:21 +0200
Subject: [PATCH] [Clang] Fix 'counted_by' for nested struct pointers (#110497)

Fix counted_by attribute for cases where the flexible array member is
accessed through struct pointer inside another struct:

```
struct variable {
int a;
int b;
int length;
short array[] __attribute__((counted_by(length)));
};

struct bucket {
int a;
struct variable *growable;
int b;
};
```

__builtin_dynamic_object_size(p->growable->array, 0);

This commit makes sure that if the StructBase is both a MemberExpr and a
pointer, it is treated as a pointer. Otherwise clang will generate to
code to access the address of p->growable intead of loading the value of
p->growable->length.

Fixes #110385
---
 clang/lib/CodeGen/CGExpr.cpp  | 16 ++--
 clang/test/CodeGen/attr-counted-by-pr110385.c | 70 
 clang/test/CodeGen/attr-counted-by.c  | 80 +--
 3 files changed, 117 insertions(+), 49 deletions(-)
 create mode 100644 clang/test/CodeGen/attr-counted-by-pr110385.c

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 5f58a64d8386c3..8d6f535bba896e 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1052,6 +1052,8 @@ class StructAccessBase
 return Visit(E->getBase());
   }
   const Expr *VisitCastExpr(const CastExpr *E) {
+if (E->getCastKind() == CK_LValueToRValue)
+  return E;
 return Visit(E->getSubExpr());
   }
   const Expr *VisitParenExpr(const ParenExpr *E) {
@@ -1119,19 +1121,15 @@ llvm::Value *CodeGenFunction::EmitCountedByFieldExpr(
 return nullptr;
 
   llvm::Value *Res = nullptr;
-  if (const auto *DRE = dyn_cast(StructBase)) {
-Res = EmitDeclRefLValue(DRE).getPointer(*this);
-Res = Builder.CreateAlignedLoad(ConvertType(DRE->getType()), Res,
-getPointerAlign(), "dre.load");
-  } else if (const MemberExpr *ME = dyn_cast(StructBase)) {
-LValue LV = EmitMemberExpr(ME);
-Address Addr = LV.getAddress();
-Res = Addr.emitRawPointer(*this);
-  } else if (StructBase->getType()->isPointerType()) {
+  if (StructBase->getType()->isPointerType()) {
 LValueBaseInfo BaseInfo;
 TBAAAccessInfo TBAAInfo;
 Address Addr = EmitPointerWithAlignment(StructBase, &BaseInfo, &TBAAInfo);
 Res = Addr.emitRawPointer(*this);
+  } else if (StructBase->isLValue()) {
+LValue LV = EmitLValue(StructBase);
+Address Addr = LV.getAddress();
+Res = Addr.emitRawPointer(*this);
   } else {
 return nullptr;
   }
diff --git a/clang/test/CodeGen/attr-counted-by-pr110385.c 
b/clang/test/CodeGen/attr-counted-by-pr110385.c
new file mode 100644
index 00..6891d5abe7d5c2
--- /dev/null
+++ b/clang/test/CodeGen/attr-counted-by-pr110385.c
@@ -0,0 +1,70 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 4
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O2 
-Wno-missing-declarations -emit-llvm -o - %s | FileCheck %s
+
+// See #110385
+// Based on reproducer from Kees Cook:
+// https://lore.kernel.org/all/202409170436.C3C6E7F7A@keescook/
+
+struct variable {
+int a;
+int b;
+int length;
+short array[] __attribute__((counted_by(length)));
+};
+
+struct bucket {
+int a;
+struct variable *growable;
+int b;
+};
+
+struct bucket2 {
+int a;
+struct variable growable;
+};
+
+void init(void * __attribute__((pass_dynamic_object_size(0;
+
+// CHECK-LABEL: define dso_local void @test1(
+// CHECK-SAME: ptr nocapture noundef readonly [[FOO:%.*]]) local_unnamed_addr 
#[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[GROWABLE:%.*]] = getelementptr inbounds i8, ptr [[FOO]], 
i64 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GROWABLE]], align 8, !tbaa 
[[TBAA2:![0-9]+]]
+// CHECK-NEXT:[[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 
12
+// CHECK-NEXT:[[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr 
[[TMP0]], i64 8
+// CHECK-NEXT:[[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr 
[[DOT_COUNTED_BY_GEP]], align 4
+// CHECK-NEXT:[[TMP1:%.*]] = sext i32 [[DOT_COUNTED_BY_LOAD]] to i64
+// CHECK-NEXT:[[TMP2:%.*]] = shl nsw i64 [[TMP1]], 1
+// CHECK-NEXT:[[TMP3:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD]], -1
+// CHECK-NEXT:[[TMP4:%.*]] = select i1 [[TMP3]], i64 [[TMP2]], i64 0
+// CHECK-NEXT:tail call void @init(ptr noundef nonnull [[ARRAY]], i64 
noundef [[TMP4]]) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:ret void
+//
+void test1(struct bucket *foo) {
+init(foo->growable->array);
+}
+
+// CHECK-LABEL: define dso_local void @test2(
+// CHECK-SAME: ptr noundef [[FOO:%.*]]) local_unnamed_add

[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:



@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Bill Wendling (bwendling)


Changes

/cherry-pick

---
Full diff: https://github.com/llvm/llvm-project/pull/111445.diff


3 Files Affected:

- (modified) clang/lib/CodeGen/CGExpr.cpp (+7-9) 
- (added) clang/test/CodeGen/attr-counted-by-pr110385.c (+70) 
- (modified) clang/test/CodeGen/attr-counted-by.c (+40-40) 


``diff
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 5f58a64d8386c3..8d6f535bba896e 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1052,6 +1052,8 @@ class StructAccessBase
 return Visit(E->getBase());
   }
   const Expr *VisitCastExpr(const CastExpr *E) {
+if (E->getCastKind() == CK_LValueToRValue)
+  return E;
 return Visit(E->getSubExpr());
   }
   const Expr *VisitParenExpr(const ParenExpr *E) {
@@ -1119,19 +1121,15 @@ llvm::Value *CodeGenFunction::EmitCountedByFieldExpr(
 return nullptr;
 
   llvm::Value *Res = nullptr;
-  if (const auto *DRE = dyn_cast(StructBase)) {
-Res = EmitDeclRefLValue(DRE).getPointer(*this);
-Res = Builder.CreateAlignedLoad(ConvertType(DRE->getType()), Res,
-getPointerAlign(), "dre.load");
-  } else if (const MemberExpr *ME = dyn_cast(StructBase)) {
-LValue LV = EmitMemberExpr(ME);
-Address Addr = LV.getAddress();
-Res = Addr.emitRawPointer(*this);
-  } else if (StructBase->getType()->isPointerType()) {
+  if (StructBase->getType()->isPointerType()) {
 LValueBaseInfo BaseInfo;
 TBAAAccessInfo TBAAInfo;
 Address Addr = EmitPointerWithAlignment(StructBase, &BaseInfo, &TBAAInfo);
 Res = Addr.emitRawPointer(*this);
+  } else if (StructBase->isLValue()) {
+LValue LV = EmitLValue(StructBase);
+Address Addr = LV.getAddress();
+Res = Addr.emitRawPointer(*this);
   } else {
 return nullptr;
   }
diff --git a/clang/test/CodeGen/attr-counted-by-pr110385.c 
b/clang/test/CodeGen/attr-counted-by-pr110385.c
new file mode 100644
index 00..6891d5abe7d5c2
--- /dev/null
+++ b/clang/test/CodeGen/attr-counted-by-pr110385.c
@@ -0,0 +1,70 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py 
UTC_ARGS: --version 4
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -O2 
-Wno-missing-declarations -emit-llvm -o - %s | FileCheck %s
+
+// See #110385
+// Based on reproducer from Kees Cook:
+// https://lore.kernel.org/all/202409170436.C3C6E7F7A@keescook/
+
+struct variable {
+int a;
+int b;
+int length;
+short array[] __attribute__((counted_by(length)));
+};
+
+struct bucket {
+int a;
+struct variable *growable;
+int b;
+};
+
+struct bucket2 {
+int a;
+struct variable growable;
+};
+
+void init(void * __attribute__((pass_dynamic_object_size(0;
+
+// CHECK-LABEL: define dso_local void @test1(
+// CHECK-SAME: ptr nocapture noundef readonly [[FOO:%.*]]) local_unnamed_addr 
#[[ATTR0:[0-9]+]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[GROWABLE:%.*]] = getelementptr inbounds i8, ptr [[FOO]], 
i64 8
+// CHECK-NEXT:[[TMP0:%.*]] = load ptr, ptr [[GROWABLE]], align 8, !tbaa 
[[TBAA2:![0-9]+]]
+// CHECK-NEXT:[[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[TMP0]], i64 
12
+// CHECK-NEXT:[[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr 
[[TMP0]], i64 8
+// CHECK-NEXT:[[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr 
[[DOT_COUNTED_BY_GEP]], align 4
+// CHECK-NEXT:[[TMP1:%.*]] = sext i32 [[DOT_COUNTED_BY_LOAD]] to i64
+// CHECK-NEXT:[[TMP2:%.*]] = shl nsw i64 [[TMP1]], 1
+// CHECK-NEXT:[[TMP3:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD]], -1
+// CHECK-NEXT:[[TMP4:%.*]] = select i1 [[TMP3]], i64 [[TMP2]], i64 0
+// CHECK-NEXT:tail call void @init(ptr noundef nonnull [[ARRAY]], i64 
noundef [[TMP4]]) #[[ATTR2:[0-9]+]]
+// CHECK-NEXT:ret void
+//
+void test1(struct bucket *foo) {
+init(foo->growable->array);
+}
+
+// CHECK-LABEL: define dso_local void @test2(
+// CHECK-SAME: ptr noundef [[FOO:%.*]]) local_unnamed_addr #[[ATTR0]] {
+// CHECK-NEXT:  entry:
+// CHECK-NEXT:[[ARRAY:%.*]] = getelementptr inbounds i8, ptr [[FOO]], i64 
16
+// CHECK-NEXT:[[DOT_COUNTED_BY_GEP:%.*]] = getelementptr inbounds i8, ptr 
[[FOO]], i64 12
+// CHECK-NEXT:[[DOT_COUNTED_BY_LOAD:%.*]] = load i32, ptr 
[[DOT_COUNTED_BY_GEP]], align 4
+// CHECK-NEXT:[[TMP0:%.*]] = sext i32 [[DOT_COUNTED_BY_LOAD]] to i64
+// CHECK-NEXT:[[TMP1:%.*]] = shl nsw i64 [[TMP0]], 1
+// CHECK-NEXT:[[TMP2:%.*]] = icmp sgt i32 [[DOT_COUNTED_BY_LOAD]], -1
+// CHECK-NEXT:[[TMP3:%.*]] = select i1 [[TMP2]], i64 [[TMP1]], i64 0
+// CHECK-NEXT:tail call void @init(ptr noundef nonnull [[ARRAY]], i64 
noundef [[TMP3]]) #[[ATTR2]]
+// CHECK-NEXT:ret void
+//
+void test2(struct bucket2 *foo) {
+init(foo->growable.array);
+}
+//.
+// CHECK: [[TBAA2]] = !{[[META3:![0-9]+]], [[META7:![0-9]+]],

[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread Bill Wendling via llvm-branch-commits

https://github.com/bwendling milestoned 
https://github.com/llvm/llvm-project/pull/111445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread via llvm-branch-commits

github-actions[bot] wrote:

⚠️ We detected that you are using a GitHub private e-mail address to contribute 
to the repo. Please turn off [Keep my email addresses 
private](https://github.com/settings/emails) setting in your account. See 
[LLVM 
Discourse](https://discourse.llvm.org/t/hidden-emails-on-github-should-we-do-something-about-it)
 for more information.

https://github.com/llvm/llvm-project/pull/111445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:

Failed to create pull request for issue111445 
https://github.com/llvm/llvm-project/actions/runs/11224536586

https://github.com/llvm/llvm-project/pull/111445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread via llvm-branch-commits

llvmbot wrote:

Failed to create pull request for issue111445 
https://github.com/llvm/llvm-project/actions/runs/11224536620

https://github.com/llvm/llvm-project/pull/111445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [Clang] Fix 'counted_by' for nested struct pointers (#110497) (PR #111445)

2024-10-07 Thread Bill Wendling via llvm-branch-commits

https://github.com/bwendling edited 
https://github.com/llvm/llvm-project/pull/111445
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [clang] [clang-format] Handle template closer followed by empty paretheses (PR #111245)

2024-10-07 Thread via llvm-branch-commits

https://github.com/mydeveloperday approved this pull request.


https://github.com/llvm/llvm-project/pull/111245
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM (PR #109939)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109939

>From 839e0c12abe69f277810ff04be823c4fa07e4af3 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 11:41:18 +
Subject: [PATCH 1/2] [NewPM][AMDGPU] Port SIPreAllocateWWMRegs to NPM

---
 llvm/lib/Target/AMDGPU/AMDGPU.h   |  6 +-
 llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def |  1 +
 .../lib/Target/AMDGPU/AMDGPUTargetMachine.cpp |  7 ++-
 .../Target/AMDGPU/SIPreAllocateWWMRegs.cpp| 60 ---
 llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h | 25 
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 20 +++
 6 files changed, 92 insertions(+), 27 deletions(-)
 create mode 100644 llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.h

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 342d55e828bca5..95d0ad0f9dc96a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -49,7 +49,7 @@ FunctionPass *createSIFixSGPRCopiesLegacyPass();
 FunctionPass *createLowerWWMCopiesPass();
 FunctionPass *createSIMemoryLegalizerPass();
 FunctionPass *createSIInsertWaitcntsPass();
-FunctionPass *createSIPreAllocateWWMRegsPass();
+FunctionPass *createSIPreAllocateWWMRegsLegacyPass();
 FunctionPass *createSIFormMemoryClausesPass();
 
 FunctionPass *createSIPostRABundlerPass();
@@ -212,8 +212,8 @@ extern char &SILateBranchLoweringPassID;
 void initializeSIOptimizeExecMaskingPass(PassRegistry &);
 extern char &SIOptimizeExecMaskingID;
 
-void initializeSIPreAllocateWWMRegsPass(PassRegistry &);
-extern char &SIPreAllocateWWMRegsID;
+void initializeSIPreAllocateWWMRegsLegacyPass(PassRegistry &);
+extern char &SIPreAllocateWWMRegsLegacyID;
 
 void initializeAMDGPUImageIntrinsicOptimizerPass(PassRegistry &);
 extern char &AMDGPUImageIntrinsicOptimizerID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def 
b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
index 0ebf34c901c142..174a90f0aa419d 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPassRegistry.def
@@ -102,5 +102,6 @@ MACHINE_FUNCTION_PASS("gcn-dpp-combine", 
GCNDPPCombinePass())
 MACHINE_FUNCTION_PASS("si-load-store-opt", SILoadStoreOptimizerPass())
 MACHINE_FUNCTION_PASS("si-lower-sgpr-spills", SILowerSGPRSpillsPass())
 MACHINE_FUNCTION_PASS("si-peephole-sdwa", SIPeepholeSDWAPass())
+MACHINE_FUNCTION_PASS("si-pre-allocate-wwm-regs", SIPreAllocateWWMRegsPass())
 MACHINE_FUNCTION_PASS("si-shrink-instructions", SIShrinkInstructionsPass())
 #undef MACHINE_FUNCTION_PASS
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 1f2148c2922de9..dc5330740f4a6b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -41,6 +41,7 @@
 #include "SIMachineFunctionInfo.h"
 #include "SIMachineScheduler.h"
 #include "SIPeepholeSDWA.h"
+#include "SIPreAllocateWWMRegs.h"
 #include "SIShrinkInstructions.h"
 #include "TargetInfo/AMDGPUTargetInfo.h"
 #include "Utils/AMDGPUBaseInfo.h"
@@ -506,7 +507,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void 
LLVMInitializeAMDGPUTarget() {
   initializeSILateBranchLoweringPass(*PR);
   initializeSIMemoryLegalizerPass(*PR);
   initializeSIOptimizeExecMaskingPass(*PR);
-  initializeSIPreAllocateWWMRegsPass(*PR);
+  initializeSIPreAllocateWWMRegsLegacyPass(*PR);
   initializeSIFormMemoryClausesPass(*PR);
   initializeSIPostRABundlerPass(*PR);
   initializeGCNCreateVOPDPass(*PR);
@@ -1505,7 +1506,7 @@ bool GCNPassConfig::addRegAssignAndRewriteFast() {
   addPass(&SILowerSGPRSpillsLegacyID);
 
   // To Allocate wwm registers used in whole quad mode operations (for 
shaders).
-  addPass(&SIPreAllocateWWMRegsID);
+  addPass(&SIPreAllocateWWMRegsLegacyID);
 
   // For allocating other wwm register operands.
   addPass(createWWMRegAllocPass(false));
@@ -1537,7 +1538,7 @@ bool GCNPassConfig::addRegAssignAndRewriteOptimized() {
   addPass(&SILowerSGPRSpillsLegacyID);
 
   // To Allocate wwm registers used in whole quad mode operations (for 
shaders).
-  addPass(&SIPreAllocateWWMRegsID);
+  addPass(&SIPreAllocateWWMRegsLegacyID);
 
   // For allocating other whole wave mode registers.
   addPass(createWWMRegAllocPass(true));
diff --git a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp 
b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
index 07303e2aa726c5..f9109c01c8085b 100644
--- a/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
+++ b/llvm/lib/Target/AMDGPU/SIPreAllocateWWMRegs.cpp
@@ -11,6 +11,7 @@
 //
 
//===--===//
 
+#include "SIPreAllocateWWMRegs.h"
 #include "AMDGPU.h"
 #include "GCNSubtarget.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
@@ -34,7 +35,7 @@ static cl::opt
 
 namespace {
 
-class SIPreAllocateWWMRegs : public MachineFunctionPass {
+class SIPreAllocateWWMRegs {
 private:
   const SIInstrInfo *TII;
   const SIRegisterInfo *TRI;

[llvm-branch-commits] [llvm] Update correct dependency (PR #109937)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109937

>From a1925ae960ca3c8637ebb9a7bf7085dc787ee438 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 06:35:43 +
Subject: [PATCH] Update correct dependency

Replace unused analysis dependency with the used one (SlotIndexes)
---
 llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp 
b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
index 4afefa3d9b245c..d8697aa2ffe1cd 100644
--- a/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerSGPRSpills.cpp
@@ -95,8 +95,8 @@ char SILowerSGPRSpillsLegacy::ID = 0;
 INITIALIZE_PASS_BEGIN(SILowerSGPRSpillsLegacy, DEBUG_TYPE,
   "SI lower SGPR spill instructions", false, false)
 INITIALIZE_PASS_DEPENDENCY(LiveIntervalsWrapperPass)
-INITIALIZE_PASS_DEPENDENCY(VirtRegMapWrapperLegacy)
 INITIALIZE_PASS_DEPENDENCY(MachineDominatorTreeWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(SlotIndexesWrapperPass)
 INITIALIZE_PASS_END(SILowerSGPRSpillsLegacy, DEBUG_TYPE,
 "SI lower SGPR spill instructions", false, false)
 

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] Update correct dependency (PR #109937)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke edited 
https://github.com/llvm/llvm-project/pull/109937
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [NewPM][CodeGen] Port LiveRegMatrix to NPM (PR #109938)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109938

>From 60e7f83fe680b04e3cb7c8e2e7bb2383fa0fdded Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Tue, 24 Sep 2024 09:07:04 +
Subject: [PATCH 1/4] [NewPM][CodeGen] Port LiveRegMatrix to NPM

---
 llvm/include/llvm/CodeGen/LiveRegMatrix.h | 50 ---
 llvm/include/llvm/InitializePasses.h  |  2 +-
 .../llvm/Passes/MachinePassRegistry.def   |  4 +-
 llvm/lib/CodeGen/LiveRegMatrix.cpp| 38 ++
 llvm/lib/CodeGen/RegAllocBasic.cpp|  8 +--
 llvm/lib/CodeGen/RegAllocGreedy.cpp   |  8 +--
 llvm/lib/Passes/PassBuilder.cpp   |  1 +
 llvm/lib/Target/AMDGPU/GCNNSAReassign.cpp |  6 +--
 .../Target/AMDGPU/SIPreAllocateWWMRegs.cpp|  6 +--
 9 files changed, 88 insertions(+), 35 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/LiveRegMatrix.h 
b/llvm/include/llvm/CodeGen/LiveRegMatrix.h
index 2b32308c7c075e..c024ca9c1dc38d 100644
--- a/llvm/include/llvm/CodeGen/LiveRegMatrix.h
+++ b/llvm/include/llvm/CodeGen/LiveRegMatrix.h
@@ -37,7 +37,9 @@ class MachineFunction;
 class TargetRegisterInfo;
 class VirtRegMap;
 
-class LiveRegMatrix : public MachineFunctionPass {
+class LiveRegMatrix {
+  friend class LiveRegMatrixWrapperPass;
+  friend class LiveRegMatrixAnalysis;
   const TargetRegisterInfo *TRI = nullptr;
   LiveIntervals *LIS = nullptr;
   VirtRegMap *VRM = nullptr;
@@ -57,15 +59,21 @@ class LiveRegMatrix : public MachineFunctionPass {
   unsigned RegMaskVirtReg = 0;
   BitVector RegMaskUsable;
 
-  // MachineFunctionPass boilerplate.
-  void getAnalysisUsage(AnalysisUsage &) const override;
-  bool runOnMachineFunction(MachineFunction &) override;
-  void releaseMemory() override;
+  LiveRegMatrix() = default;
+  void releaseMemory();
 
 public:
-  static char ID;
-
-  LiveRegMatrix();
+  LiveRegMatrix(LiveRegMatrix &&Other)
+  : TRI(Other.TRI), LIS(Other.LIS), VRM(Other.VRM), UserTag(Other.UserTag),
+Matrix(std::move(Other.Matrix)), Queries(std::move(Other.Queries)),
+RegMaskTag(Other.RegMaskTag), RegMaskVirtReg(Other.RegMaskVirtReg),
+RegMaskUsable(std::move(Other.RegMaskUsable)) {
+Other.TRI = nullptr;
+Other.LIS = nullptr;
+Other.VRM = nullptr;
+  }
+
+  void init(MachineFunction &MF, LiveIntervals *LIS, VirtRegMap *VRM);
 
   
//======//
   // High-level interface.
@@ -159,6 +167,32 @@ class LiveRegMatrix : public MachineFunctionPass {
   Register getOneVReg(unsigned PhysReg) const;
 };
 
+class LiveRegMatrixWrapperPass : public MachineFunctionPass {
+  LiveRegMatrix LRM;
+
+public:
+  static char ID;
+
+  LiveRegMatrixWrapperPass() : MachineFunctionPass(ID) {}
+
+  LiveRegMatrix &getLRM() { return LRM; }
+  const LiveRegMatrix &getLRM() const { return LRM; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  void releaseMemory() override;
+};
+
+class LiveRegMatrixAnalysis : public AnalysisInfoMixin {
+  friend AnalysisInfoMixin;
+  static AnalysisKey Key;
+
+public:
+  using Result = LiveRegMatrix;
+
+  LiveRegMatrix run(MachineFunction &MF, MachineFunctionAnalysisManager &MFAM);
+};
+
 } // end namespace llvm
 
 #endif // LLVM_CODEGEN_LIVEREGMATRIX_H
diff --git a/llvm/include/llvm/InitializePasses.h 
b/llvm/include/llvm/InitializePasses.h
index d89a5538b46975..3fee8c40a6607e 100644
--- a/llvm/include/llvm/InitializePasses.h
+++ b/llvm/include/llvm/InitializePasses.h
@@ -156,7 +156,7 @@ void initializeLiveDebugValuesPass(PassRegistry &);
 void initializeLiveDebugVariablesPass(PassRegistry &);
 void initializeLiveIntervalsWrapperPassPass(PassRegistry &);
 void initializeLiveRangeShrinkPass(PassRegistry &);
-void initializeLiveRegMatrixPass(PassRegistry &);
+void initializeLiveRegMatrixWrapperPassPass(PassRegistry &);
 void initializeLiveStacksPass(PassRegistry &);
 void initializeLiveVariablesWrapperPassPass(PassRegistry &);
 void initializeLoadStoreOptPass(PassRegistry &);
diff --git a/llvm/include/llvm/Passes/MachinePassRegistry.def 
b/llvm/include/llvm/Passes/MachinePassRegistry.def
index bdc56ca03f392a..4497c1fce0db69 100644
--- a/llvm/include/llvm/Passes/MachinePassRegistry.def
+++ b/llvm/include/llvm/Passes/MachinePassRegistry.def
@@ -97,6 +97,7 @@ LOOP_PASS("loop-term-fold", LoopTermFoldPass())
 // preferably fix the scavenger to not depend on them).
 MACHINE_FUNCTION_ANALYSIS("live-intervals", LiveIntervalsAnalysis())
 MACHINE_FUNCTION_ANALYSIS("live-vars", LiveVariablesAnalysis())
+MACHINE_FUNCTION_ANALYSIS("live-reg-matrix", LiveRegMatrixAnalysis())
 MACHINE_FUNCTION_ANALYSIS("machine-block-freq", 
MachineBlockFrequencyAnalysis())
 MACHINE_FUNCTION_ANALYSIS("machine-branch-prob",
   MachineBranchProbabilityAnalysis())
@@ -122,8 +123,7 @@ MACHINE_FUNCTION_ANALYSIS("virtregmap", 
VirtRegMapAnalysis())
 // MachineRegionIn

[llvm-branch-commits] [llvm] [AMDGPU] Add tests for SIPreAllocateWWMRegs (PR #109963)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/109963

>From 241cefb63e69298c0122b3aa7dcf2bcde7426c06 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Wed, 25 Sep 2024 11:21:04 +
Subject: [PATCH 1/2] [AMDGPU] Add tests for SIPreAllocateWWMRegs

---
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 26 +++
 .../si-pre-allocate-wwm-sgpr-spills.mir   | 21 +++
 2 files changed, 47 insertions(+)
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
 create mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
new file mode 100644
index 00..f2db299f575f5e
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
@@ -0,0 +1,26 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-run-pass=si-pre-allocate-wwm-regs -o - -mcpu=tahiti %s  | FileCheck %s
+
+---
+
+name: pre_allocate_wwm_regs_strict
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK-LABEL: name: pre_allocate_wwm_regs_strict
+; CHECK: liveins: $sgpr1
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK-NEXT: renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def 
$exec, implicit-def $scc, implicit $exec
+; CHECK-NEXT: $vgpr0 = V_MOV_B32_e32 0, implicit $exec
+; CHECK-NEXT: dead $vgpr0 = V_MOV_B32_dpp $vgpr0, [[DEF]], 323, 12, 15, 0, 
implicit $exec
+; CHECK-NEXT: $exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def $exec, 
implicit-def $scc, implicit $exec
+%24:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+%25:vgpr_32 = V_MOV_B32_dpp %24:vgpr_32(tied-def 0), %0:vgpr_32, 323, 12, 
15, 0, implicit $exec
+$exec = EXIT_STRICT_WWM killed renamable $sgpr4_sgpr5
+%2:vgpr_32 = COPY %0:vgpr_32
+...
diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir
new file mode 100644
index 00..f0efe74878d831
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir
@@ -0,0 +1,21 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-amdgpu-prealloc-sgpr-spill-vgprs -run-pass=si-pre-allocate-wwm-regs -o - 
-mcpu=tahiti %s | FileCheck %s
+
+---
+
+name: pre_allocate_wwm_spill_to_vgpr
+tracksRegLiveness: true
+body: |
+  bb.0:
+liveins: $sgpr1
+; CHECK-LABEL: name: pre_allocate_wwm_spill_to_vgpr
+; CHECK: liveins: $sgpr1
+; CHECK-NEXT: {{  $}}
+; CHECK-NEXT: [[DEF:%[0-9]+]]:vgpr_32 = IMPLICIT_DEF
+; CHECK-NEXT: dead $vgpr0 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, [[DEF]]
+; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
+%0:vgpr_32 = IMPLICIT_DEF
+%23:vgpr_32 = SI_SPILL_S32_TO_VGPR $sgpr1, 0, %0:vgpr_32
+%2:vgpr_32 = COPY %0:vgpr_32
+...
+

>From 131cedbef983bf5142286d648009170478560f8e Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 7 Oct 2024 09:13:04 +
Subject: [PATCH 2/2] Keep tests in one file

---
 .../AMDGPU/si-pre-allocate-wwm-regs.mir   | 23 ---
 .../si-pre-allocate-wwm-sgpr-spills.mir   | 21 -
 2 files changed, 20 insertions(+), 24 deletions(-)
 delete mode 100644 llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-sgpr-spills.mir

diff --git a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir 
b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
index f2db299f575f5e..4dcad87a985c0b 100644
--- a/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-pre-allocate-wwm-regs.mir
@@ -1,5 +1,6 @@
 # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py 
UTC_ARGS: --version 5
 # RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-run-pass=si-pre-allocate-wwm-regs -o - -mcpu=tahiti %s  | FileCheck %s
+# RUN: llc -mtriple=amdgcn -verify-machineinstrs 
-amdgpu-prealloc-sgpr-spill-vgprs -run-pass=si-pre-allocate-wwm-regs -o - 
-mcpu=tahiti %s | FileCheck %s --check-prefix=CHECK2
 
 ---
 
@@ -19,8 +20,24 @@ body: |
 ; CHECK-NEXT: dead [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
 %0:vgpr_32 = IMPLICIT_DEF
 renamable $sgpr4_sgpr5 = ENTER_STRICT_WWM -1, implicit-def $exec, 
implicit-def $scc, implicit $exec
-%24:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
-%25:vgpr_32 = V_MOV_B32_dpp %24:vgpr_32(tied-def 0), %0:vgpr_32, 323, 12, 
15, 0, implicit $exec
+%1:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
+%2:vgpr_32 = V_MOV_B32_dpp %1, %0, 323, 12, 15, 0, implicit $exec
 $exec = EXIT_STRIC

[llvm-branch-commits] [llvm] [CodeGen] LiveIntervalUnions::Array Implement move constructor (PR #111357)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/111357

>From 47bb19208ed2de82109eb160ba6177b7f888be26 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Mon, 7 Oct 2024 08:42:24 +
Subject: [PATCH] [CodeGen] LiveIntervalUnions::Array  Implement move
 constructor

---
 llvm/include/llvm/CodeGen/LiveIntervalUnion.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h 
b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
index 81003455da4241..cc0f2a45bb182c 100644
--- a/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
+++ b/llvm/include/llvm/CodeGen/LiveIntervalUnion.h
@@ -176,6 +176,13 @@ class LiveIntervalUnion {
 Array() = default;
 ~Array() { clear(); }
 
+Array(Array &&Other) : Size(Other.Size), LIUs(Other.LIUs) {
+  Other.Size = 0;
+  Other.LIUs = nullptr;
+}
+
+Array(const Array &) = delete;
+
 // Initialize the array to have Size entries.
 // Reuse an existing allocation if the size matches.
 void init(LiveIntervalUnion::Allocator&, unsigned Size);

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Serialize WWM_REG vreg flag (PR #110229)

2024-10-07 Thread Akshat Oke via llvm-branch-commits


@@ -578,3 +578,18 @@ body: |
 SI_RETURN
 
 ...
+---

Akshat-Oke wrote:

I've put it in the generic test 
llvm/test/CodeGen/MIR/Generic/register-flag-error.mir

https://github.com/llvm/llvm-project/pull/110229
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [AMDGPU] Serialize WWM_REG vreg flag (PR #110229)

2024-10-07 Thread Akshat Oke via llvm-branch-commits

https://github.com/Akshat-Oke updated 
https://github.com/llvm/llvm-project/pull/110229

>From 1cbc26fe2de38ae4e174aec128b39c899dab9136 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Fri, 27 Sep 2024 08:58:39 +
Subject: [PATCH 1/4] [AMDGPU] Serialize WWM_REG vreg flag

---
 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | 15 +++
 llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h |  4 ++--
 llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp  | 11 +++
 llvm/lib/Target/AMDGPU/SIRegisterInfo.h| 10 ++
 llvm/test/CodeGen/AMDGPU/virtual-registers.mir | 16 
 5 files changed, 54 insertions(+), 2 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/virtual-registers.mir

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp 
b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index 1f2148c2922de9..28578a875c164c 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -1712,6 +1712,21 @@ bool GCNTargetMachine::parseMachineFunctionInfo(
 MFI->reserveWWMRegister(ParsedReg);
   }
 
+  auto setRegisterFlags = [&](const VRegInfo &Info) {
+for (const auto &Flag : Info.Flags) {
+  MFI->setFlag(Info.VReg, Flag);
+}
+  };
+
+  for (const auto &P : PFS.VRegInfosNamed) {
+const VRegInfo &Info = *P.second;
+setRegisterFlags(Info);
+  }
+  for (const auto &P : PFS.VRegInfos) {
+const VRegInfo &Info = *P.second;
+setRegisterFlags(Info);
+  }
+
   auto parseAndCheckArgument = [&](const std::optional &A,
const TargetRegisterClass &RC,
ArgDescriptor &Arg, unsigned UserSGPRs,
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h 
b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
index 669f98dd865d61..e28c24bf8f8500 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h
@@ -693,8 +693,8 @@ class SIMachineFunctionInfo final : public 
AMDGPUMachineFunction,
 
   void setFlag(Register Reg, uint8_t Flag) {
 assert(Reg.isVirtual());
-if (VRegFlags.inBounds(Reg))
-  VRegFlags[Reg] |= Flag;
+VRegFlags.grow(Reg);
+VRegFlags[Reg] |= Flag;
   }
 
   bool checkFlag(Register Reg, uint8_t Flag) const {
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 9e1c4941dba283..84569b3f11df67 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3839,3 +3839,14 @@ SIRegisterInfo::getSubRegAlignmentNumBits(const 
TargetRegisterClass *RC,
   }
   return 0;
 }
+
+SmallVector
+SIRegisterInfo::getVRegFlagsOfReg(Register Reg,
+  const MachineFunction &MF) const {
+  SmallVector RegFlags;
+  const SIMachineFunctionInfo *FuncInfo = MF.getInfo();
+  if (FuncInfo->checkFlag(Reg, AMDGPU::VirtRegFlag::WWM_REG)) {
+RegFlags.push_back("WWM_REG");
+  }
+  return RegFlags;
+}
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h 
b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 409e5418abc8ec..2c3707e119178a 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -454,6 +454,16 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
   // No check if the subreg is supported by the current RC is made.
   unsigned getSubRegAlignmentNumBits(const TargetRegisterClass *RC,
  unsigned SubReg) const;
+
+  std::pair getVRegFlagValue(StringRef Name) const override {
+if (Name == "WWM_REG") {
+  return {true, AMDGPU::VirtRegFlag::WWM_REG};
+}
+return {false, 0};
+  }
+
+  SmallVector
+  getVRegFlagsOfReg(Register Reg, const MachineFunction &MF) const override;
 };
 
 namespace AMDGPU {
diff --git a/llvm/test/CodeGen/AMDGPU/virtual-registers.mir 
b/llvm/test/CodeGen/AMDGPU/virtual-registers.mir
new file mode 100644
index 00..3ea8f6eafcf10c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/virtual-registers.mir
@@ -0,0 +1,16 @@
+# RUN: llc -mtriple=amdgcn -run-pass=none -o - %s | FileCheck %s
+# This test ensures that the MIR parser parses virtual register flags correctly
+
+---
+name: vregs
+# CHECK: registers:
+# CHECK-NEXT:   - { id: 0, class: vgpr_32, preferred-register: '$vgpr1', 
flags: [ WWM_REG ] }
+# CHECK-NEXT:   - { id: 1, class: sgpr_64, preferred-register: '$sgpr0_sgpr1', 
flags: [  ] }
+# CHECK-NEXT:   - { id: 2, class: sgpr_64, preferred-register: '', flags: [  ] 
}
+registers:
+  - { id: 0, class: vgpr_32, preferred-register: $vgpr1, flags: [ WWM_REG ]}
+  - { id: 1, class: sgpr_64, preferred-register: $sgpr0_sgpr1 }
+body: |
+  bb.0:
+%2:sgpr_64 = COPY %1
+%1:sgpr_64 = COPY %0

>From 4c26b4f3b1ad8952767625ba949eaa750aec0652 Mon Sep 17 00:00:00 2001
From: Akshat Oke 
Date: Fri, 4 Oct 2024 06:31:06 +
Subject: [PATCH 2/4] Correct TRI methods to optional<> and SmallString

---
 llvm/lib/Target/AMDGPU/SIRegisterI