[clang] [flang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)
rj-jesus wrote: Chipping into the discussion, since this patch I can also no longer build OpenBLAS or PETSc. OpenBLAS for example fails with ``` $ clang -v -O3 -mcpu=native -DHAVE_C11 -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=72 -DMAX_PARALLEL_NUMBER=1 -DMAX_STACK_ALLOC=2048 -DNO_AFFINITY -DVERSION="\"0.3.25\"" -DBUILD_SINGLE -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 utest/CMakeFiles/openblas_utest.dir/utest_main.c.o utest/CMakeFiles/openblas_utest.dir/test_min.c.o utest/CMakeFiles/openblas_utest.dir/test_amax.c.o utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o utest/CMakeFiles/openblas_utest.dir/test_rot.c.o utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o utest/CMakeFiles/openblas_utest.dir/test_swap.c.o utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -o utest/openblas_utest -Wl,-rpath,/.../openblas/build/lib lib/libopenblas.so.0.3 -lm clang version 18.0.0 (g...@github.com:llvm/llvm-project.git 17feb330aab39c6c0c21ee9b02efb484dfb2261e) Target: aarch64-unknown-linux-gnu Thread model: posix InstalledDir: /.../llvm/trunk/bin Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/11 Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12 Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12 Candidate multilib: .;@m64 Selected multilib: .;@m64 Found CUDA installation: /usr/local/cuda, version "/usr/bin/ld" -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux -pie -dynamic-linker /lib/ld-linux-aarch64.so.1 -o utest/openblas_utest /lib/aarch64-linux-gnu/Scrt1.o /lib/aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/12/crtbeginS.o -L/.../llvm/trunk/lib/clang/18/lib/aarch64-unknown-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/12 -L/lib/aarch64-linux-gnu -L/usr/lib/aarch64-linux-gnu -L/lib -L/usr/lib -L/.../llvm/trunk/lib utest/CMakeFiles/openblas_utest.dir/utest_main.c.o utest/CMakeFiles/openblas_utest.dir/test_min.c.o utest/CMakeFiles/openblas_utest.dir/test_amax.c.o utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o utest/CMakeFiles/openblas_utest.dir/test_rot.c.o utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o utest/CMakeFiles/openblas_utest.dir/test_swap.c.o utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -rpath /.../openblas/build/lib lib/libopenblas.so.0.3 -lm -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-linux-gnu/12/crtendS.o /lib/aarch64-linux-gnu/crtn.o /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to `_QQEnvironmentDefaults' /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to `_QQmain' ``` https://github.com/llvm/llvm-project/pull/73124 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)
rj-jesus wrote: > > Chipping into the discussion, since this patch I can also no longer build > > OpenBLAS or PETSc. OpenBLAS for example fails with > > ``` > > $ clang -v -O3 -mcpu=native -DHAVE_C11 -Wall -DF_INTERFACE_GFORT -fPIC > > -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=72 -DMAX_PARALLEL_NUMBER=1 > > -DMAX_STACK_ALLOC=2048 -DNO_AFFINITY -DVERSION="\"0.3.25\"" -DBUILD_SINGLE > > -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16 > > utest/CMakeFiles/openblas_utest.dir/utest_main.c.o > > utest/CMakeFiles/openblas_utest.dir/test_min.c.o > > utest/CMakeFiles/openblas_utest.dir/test_amax.c.o > > utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o > > utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o > > utest/CMakeFiles/openblas_utest.dir/test_rot.c.o > > utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o > > utest/CMakeFiles/openblas_utest.dir/test_swap.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o > > utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o > > utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -o > > utest/openblas_utest -Wl,-rpath,/.../openblas/build/lib > > lib/libopenblas.so.0.3 -lm > > clang version 18.0.0 (g...@github.com:llvm/llvm-project.git > > 17feb330aab39c6c0c21ee9b02efb484dfb2261e) > > Target: aarch64-unknown-linux-gnu > > Thread model: posix > > InstalledDir: /.../llvm/trunk/bin > > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/11 > > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12 > > Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12 > > Candidate multilib: .;@m64 > > Selected multilib: .;@m64 > > Found CUDA installation: /usr/local/cuda, version > > "/usr/bin/ld" -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux > > -pie -dynamic-linker /lib/ld-linux-aarch64.so.1 -o utest/openblas_utest > > /lib/aarch64-linux-gnu/Scrt1.o /lib/aarch64-linux-gnu/crti.o > > /usr/lib/gcc/aarch64-linux-gnu/12/crtbeginS.o > > -L/.../llvm/trunk/lib/clang/18/lib/aarch64-unknown-linux-gnu > > -L/usr/lib/gcc/aarch64-linux-gnu/12 -L/lib/aarch64-linux-gnu > > -L/usr/lib/aarch64-linux-gnu -L/lib -L/usr/lib -L/.../llvm/trunk/lib > > utest/CMakeFiles/openblas_utest.dir/utest_main.c.o > > utest/CMakeFiles/openblas_utest.dir/test_min.c.o > > utest/CMakeFiles/openblas_utest.dir/test_amax.c.o > > utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o > > utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o > > utest/CMakeFiles/openblas_utest.dir/test_rot.c.o > > utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o > > utest/CMakeFiles/openblas_utest.dir/test_swap.c.o > > utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o > > utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o > > utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -rpath > > /.../openblas/build/lib lib/libopenblas.so.0.3 -lm -lgcc --as-needed > > -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed > > /usr/lib/gcc/aarch64-linux-gnu/12/crtendS.o /lib/aarch64-linux-gnu/crtn.o > > /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to > > `_QQEnvironmentDefaults' > > /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to `_QQmain' > > ``` > > Thanks for the report! Can you please tell me how OpenBLAS was built? I'm > trying to replicate this, but I do not see a reference to `_QQmain` or the > likes in the OpenBLAS library that I build on x86. Hi @mjklemm! This was on an AArch64 box (not that that should make a difference) doing something like: ``` git clone -b v0.3.25 https://github.com/OpenMathLib/OpenBLAS.git cd OpenBLAS mkdir build && cd build cmake -G Ninja \ -DCMAKE_C_COMPILER=clang \ -DCMAKE_CXX_COMPILER=clang++ \ -DCMAKE_Fortran_COMPILER=flang-new \ -DCMAKE_C_FLAGS="-O3 -mcpu=native" \ -DCMAKE_CXX_FLAGS="-O3 -mcpu=native" \ -DCMAKE_Fortran_FLAGS="-O3 -mcpu=native" \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR .. sed -i 's/-m64//g' build.ninja sed -i 's/-rdynamic//' build.ninja cmake --build . -j32 cmake --install . ``` The error shows up when linking a C program with a Fortran shared library, so maybe you weren't enabling building shared libraries? https://github.com/llvm/llvm-project/pull/73124 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[flang] [clang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)
rj-jesus wrote: > The solution is to add `-fno-fortran-main` to the linker options via > `CMAKE_SHARED_LINKER_FLAGS`. This will need PR #74139 land first. But this > option will be a good way to control if the flang compiler should attempt > linking in the `main` stub from its library. > > It seems like `flang-new` when being used as a linker with `-shared` included > Fortran_main in the shared library. This seems wrong to me. The option > `-fno-fortran-main` avoids this. I'm pondering if `-shared` is buggy here. It > will require a bit more digging on my end to figure that out. Thanks, sounds like a good workaround to me, though as you say I find strange the need to explicitly specify "don't include main" when building a library! https://github.com/llvm/llvm-project/pull/73124 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)
rj-jesus wrote: Sounds good, thanks! :) https://github.com/llvm/llvm-project/pull/126945 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)
rj-jesus wrote: Should this be given a more general name, now that it also includes Neon types? There are also a few comments right at the start that could be extended for Neon. https://github.com/llvm/llvm-project/pull/126945 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)
rj-jesus wrote: I believe this fixes #113297, right? https://github.com/llvm/llvm-project/pull/126945 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
rj-jesus wrote: Thank you very much for checking! If you have any other comments please let me know. https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
rj-jesus wrote: Thanks for the pointer, @davemgreen. You're right, with `+strict-align` this has to be 16B aligned. This is also only valid for LE, but this should already be enforced. https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
rj-jesus wrote: Hi @paulwalker-arm, I think the alignment requirements of LD1 and LDR are indeed different, but this only matters if `AlignmentEnforced()` is enabled, right? I thought `AlignmentEnforced` wasn't generally a concern, otherwise even the current lowering we have for `vld1q_u8(uint8_t const *ptr)`, for example, seems too permissive (https://godbolt.org/z/coYefno3j): ```cpp #include uint8x16_t foo(uint8_t *ptr) { return vld1q_u8(ptr); } ``` Currently gets lowered to: ```llvm define <16 x i8> @foo(ptr %0) { %2 = load <16 x i8>, ptr %0, align 1 ret <16 x i8> %2 } ``` Which finally lowers to: ```gas foo: ldr q0, [x0] ret ``` `ptr` isn't necessarily aligned to 16 (in the IR, it's only guaranteed to be aligned to 1), but, unless I'm missing something in the docs, LDR.Q also seems to expect an alignment of 16 if `AlignmentEnforced` is enabled, and will fault if not. Am I missing anything? Also, even if we can't indeed lower LD1/ST1 to LDR/STR generally, do you think it would be worth trying to do it in some other more restricted way (for example only for SP, which I believe should be aligned to 16), or should we drop the idea entirely? https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
https://github.com/rj-jesus closed https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in { defm : unpred_loadstore_bitcast; defm : unpred_loadstore_bitcast; + // Allow using LDR/STR to avoid the predicate dependence. + let Predicates = [IsLE, AllowMisalignedMemAccesses] in +foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, nxv2f64, nxv8bf16 ] in { + let AddedComplexity = 2 in { +def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))), + (LDR_ZXI GPR64sp:$base, simm9:$offset)>; +def : Pat<(store Ty:$val, (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset)), + (STR_ZXI ZPR:$val, GPR64sp:$base, simm9:$offset)>; + } rj-jesus wrote: Thanks, that's where I had them initially, but it seems the predicates weren't being applied when the patterns were in `unpred_loadstore_bitcast`. For example, I was already using the `IsLE` predicate when I opened this PR, but it only became effective when I moved the patterns out into the separate loop as you can [see in the latest commit](https://github.com/llvm/llvm-project/pull/127837/commits/6b1ad6758cd854bc8d15e07b4dae4f2936c416bb#diff-e59a1dfcf45e1736fe516d49e304f65e921ac474ef443b78b2e6ca27abbded68). Although, now that I'm looking at it, the original patterns in `unpred_loadstore_bitcast` also [don't seem to be correct](https://godbolt.org/z/1bvndses7)? Am I missing anything? https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in { defm : unpred_loadstore_bitcast; defm : unpred_loadstore_bitcast; + // Allow using LDR/STR to avoid the predicate dependence. + let Predicates = [IsLE, AllowMisalignedMemAccesses] in +foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, nxv2f64, nxv8bf16 ] in { + let AddedComplexity = 2 in { +def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))), + (LDR_ZXI GPR64sp:$base, simm9:$offset)>; +def : Pat<(store Ty:$val, (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset)), + (STR_ZXI ZPR:$val, GPR64sp:$base, simm9:$offset)>; + } rj-jesus wrote: Ah, I see! Thanks very much, that makes sense! What if I absorb the current patterns into the loop so that we still have unconventional loads/stores grouped together, and add `HasSVE_or_SME` (from the parent definition) to the predicates? Or do you have a better suggestion? https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)
@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in { defm : unpred_loadstore_bitcast; defm : unpred_loadstore_bitcast; + // Allow using LDR/STR to avoid the predicate dependence. + let Predicates = [IsLE, AllowMisalignedMemAccesses] in rj-jesus wrote: Thank you very much for the feedback. I'll rebase the patch to resolve the conflict with `llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll` and commit it afterwards. I'll keep an ear out for reports of performance regressions. Also, I think the other two patterns we were discussing above probably also need `AllowMisalignedMemAccesses` as they change the width of the vector elements accessed. https://github.com/llvm/llvm-project/pull/127837 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)
rj-jesus wrote: I'll commit this to get the bot back to green while I look into it offline. https://github.com/llvm/llvm-project/pull/130263 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)
https://github.com/rj-jesus closed https://github.com/llvm/llvm-project/pull/130263 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)
https://github.com/rj-jesus created https://github.com/llvm/llvm-project/pull/130263 Reverts llvm/llvm-project#129732. I'll look into what's causing the buildbot reported in https://github.com/llvm/llvm-project/pull/129732#issuecomment-2705062636 to fail offline. >From 5a71fab0067bae0f532a6268749df71dbe66b4ac Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Fri, 7 Mar 2025 09:16:20 + Subject: [PATCH] Revert "[AArch64][SVE] Improve fixed-length addressing modes. (#129732)" This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352. --- .../CodeGen/AArch64/sve-vector-bits-codegen.c | 9 +- .../Target/AArch64/AArch64ISelDAGToDAG.cpp| 15 +- llvm/lib/Target/AArch64/AArch64Subtarget.h| 12 +- .../AArch64/sve-fixed-length-offsets.ll | 362 -- .../AArch64/sve-fixed-length-shuffles.ll | 90 ++--- 5 files changed, 54 insertions(+), 434 deletions(-) delete mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c index 1391a1b09fbd1..0ed14b4b3b793 100644 --- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c +++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c @@ -13,9 +13,12 @@ void func(int *restrict a, int *restrict b) { // CHECK-LABEL: func -// CHECK256-COUNT-8: str -// CHECK512-COUNT-4: str -// CHECK1024-COUNT-2: str +// CHECK256-COUNT-1: str +// CHECK256-COUNT-7: st1w +// CHECK512-COUNT-1: str +// CHECK512-COUNT-3: st1w +// CHECK1024-COUNT-1: str +// CHECK1024-COUNT-1: st1w // CHECK2048-COUNT-1: st1w #pragma clang loop vectorize(enable) for (int i = 0; i < 64; ++i) diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp index 07bcd802962fa..3ca9107cb2ce5 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp @@ -7380,23 +7380,12 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - int64_t MulImm = std::numeric_limits::max(); - if (VScale.getOpcode() == ISD::VSCALE) { -MulImm = cast(VScale.getOperand(0))->getSExtValue(); - } else if (auto C = dyn_cast(VScale)) { -int64_t ByteOffset = C->getSExtValue(); -const auto KnownVScale = -Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock; - -if (!KnownVScale || ByteOffset % KnownVScale != 0) - return false; - -MulImm = ByteOffset / KnownVScale; - } else + if (VScale.getOpcode() != ISD::VSCALE) return false; TypeSize TS = MemVT.getSizeInBits(); int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8; + int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue(); if ((MulImm % MemWidthBytes) != 0) return false; diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h index f5ffc72cae537..c6eb77e3bc3ba 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.h +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h @@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { void mirFileLoaded(MachineFunction &MF) const override; // Return the known range for the bit length of SVE data registers. A value - // of 0 means nothing is known about that particular limit beyond what's + // of 0 means nothing is known about that particular limit beyong what's // implied by the architecture. unsigned getMaxSVEVectorSizeInBits() const { assert(isSVEorStreamingSVEAvailable() && @@ -405,16 +405,6 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { return MinSVEVectorSizeInBits; } - // Return the known bit length of SVE data registers. A value of 0 means the - // length is unkown beyond what's implied by the architecture. - unsigned getSVEVectorSizeInBits() const { -assert(isSVEorStreamingSVEAvailable() && - "Tried to get SVE vector length without SVE support!"); -if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits) - return MaxSVEVectorSizeInBits; -return 0; - } - bool useSVEForFixedLengthVectors() const { if (!isSVEorStreamingSVEAvailable()) return false; diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll deleted file mode 100644 index 700bbe4f060ca..0 --- a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll +++ /dev/null @@ -1,362 +0,0 @@ -; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 -; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus closed https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/130625 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
https://github.com/rj-jesus created https://github.com/llvm/llvm-project/pull/130625 This restores commit f01e760c08365426de95f02dc2c2dc670eb47352. The original patch from #129732 exposed what seems to be a bug in `SelectAddrModeIndexedSVE`. Currently, the offset returned by `SelectAddrModeIndexedSVE` is computed by dividing a VL-based offset (`MulImm`) by the known minimum width of `MemVT`. This works when `MemVT` is a scalable vector type because scalable types are intrinsically VL-based. However, for fixed vector types, `MemVT` is not scaled to the SVE vector length, which may seemingly lead to inaccurate results. For example, for `vscale * 32`, I expect the offset returned to be `2*VL`, irrespective of the width of `MemVT` (unless the latter is an unpacked SVE type). VLA codegen seems to agree with this. However, for `<8 x i32>` vectors, VLS codegen (which uses `SelectAddrModeIndexedSVE`) returns `1*VL`: https://godbolt.org/z/7149fejGo. Is this intentional? Although this seems to affect both VSCALE-based and Constant-based offsets, I believe we didn't come across it earlier because we don't generate combinations of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made the problem (assuming _it is_ a problem) obvious because combinations of Constant offsets + fixed vectors are common. To work around the issue temporarily, I added an early exit to the Constant-based path for fixed vector types. This doesn't affect the VSCALE path because I wanted to confirm whether the current behaviour is intentional or not. I think the long-term solution is to set `MemWidthBytes = 16` for fixed vectors, which should fix the address calculation for both paths. I'm happy to do this here or open a separate PR, but first I wanted to confirm whether this is a viable solution (hence why I added a more conservative solution for the time being). What do you think? >From 03471cbf9270d1707191057de46dd38409c8a046 Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Mon, 10 Mar 2025 01:57:20 -0700 Subject: [PATCH 1/3] Reapply "[AArch64][SVE] Improve fixed-length addressing modes." (#130263) This reverts commit 21610e3ecc8bc727f99047e544186b35b1291bcd. --- .../CodeGen/AArch64/sve-vector-bits-codegen.c | 9 +- .../Target/AArch64/AArch64ISelDAGToDAG.cpp| 15 +- llvm/lib/Target/AArch64/AArch64Subtarget.h| 12 +- .../AArch64/sve-fixed-length-offsets.ll | 362 ++ .../AArch64/sve-fixed-length-shuffles.ll | 90 ++--- 5 files changed, 434 insertions(+), 54 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c index 0ed14b4b3b793..1391a1b09fbd1 100644 --- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c +++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c @@ -13,12 +13,9 @@ void func(int *restrict a, int *restrict b) { // CHECK-LABEL: func -// CHECK256-COUNT-1: str -// CHECK256-COUNT-7: st1w -// CHECK512-COUNT-1: str -// CHECK512-COUNT-3: st1w -// CHECK1024-COUNT-1: str -// CHECK1024-COUNT-1: st1w +// CHECK256-COUNT-8: str +// CHECK512-COUNT-4: str +// CHECK1024-COUNT-2: str // CHECK2048-COUNT-1: st1w #pragma clang loop vectorize(enable) for (int i = 0; i < 64; ++i) diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp index 3ca9107cb2ce5..07bcd802962fa 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp @@ -7380,12 +7380,23 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + int64_t MulImm = std::numeric_limits::max(); + if (VScale.getOpcode() == ISD::VSCALE) { +MulImm = cast(VScale.getOperand(0))->getSExtValue(); + } else if (auto C = dyn_cast(VScale)) { +int64_t ByteOffset = C->getSExtValue(); +const auto KnownVScale = +Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock; + +if (!KnownVScale || ByteOffset % KnownVScale != 0) + return false; + +MulImm = ByteOffset / KnownVScale; + } else return false; TypeSize TS = MemVT.getSizeInBits(); int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8; - int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue(); if ((MulImm % MemWidthBytes) != 0) return false; diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h index c6eb77e3bc3ba..f5ffc72cae537 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.h +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h @@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { void mirFileLoaded(MachineFunction &MF) const override; // Return the known range for the bit length of SVE data regist
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/130625 >From 03471cbf9270d1707191057de46dd38409c8a046 Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Mon, 10 Mar 2025 01:57:20 -0700 Subject: [PATCH 1/4] Reapply "[AArch64][SVE] Improve fixed-length addressing modes." (#130263) This reverts commit 21610e3ecc8bc727f99047e544186b35b1291bcd. --- .../CodeGen/AArch64/sve-vector-bits-codegen.c | 9 +- .../Target/AArch64/AArch64ISelDAGToDAG.cpp| 15 +- llvm/lib/Target/AArch64/AArch64Subtarget.h| 12 +- .../AArch64/sve-fixed-length-offsets.ll | 362 ++ .../AArch64/sve-fixed-length-shuffles.ll | 90 ++--- 5 files changed, 434 insertions(+), 54 deletions(-) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c index 0ed14b4b3b793..1391a1b09fbd1 100644 --- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c +++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c @@ -13,12 +13,9 @@ void func(int *restrict a, int *restrict b) { // CHECK-LABEL: func -// CHECK256-COUNT-1: str -// CHECK256-COUNT-7: st1w -// CHECK512-COUNT-1: str -// CHECK512-COUNT-3: st1w -// CHECK1024-COUNT-1: str -// CHECK1024-COUNT-1: st1w +// CHECK256-COUNT-8: str +// CHECK512-COUNT-4: str +// CHECK1024-COUNT-2: str // CHECK2048-COUNT-1: st1w #pragma clang loop vectorize(enable) for (int i = 0; i < 64; ++i) diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp index 3ca9107cb2ce5..07bcd802962fa 100644 --- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp +++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp @@ -7380,12 +7380,23 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + int64_t MulImm = std::numeric_limits::max(); + if (VScale.getOpcode() == ISD::VSCALE) { +MulImm = cast(VScale.getOperand(0))->getSExtValue(); + } else if (auto C = dyn_cast(VScale)) { +int64_t ByteOffset = C->getSExtValue(); +const auto KnownVScale = +Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock; + +if (!KnownVScale || ByteOffset % KnownVScale != 0) + return false; + +MulImm = ByteOffset / KnownVScale; + } else return false; TypeSize TS = MemVT.getSizeInBits(); int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8; - int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue(); if ((MulImm % MemWidthBytes) != 0) return false; diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h b/llvm/lib/Target/AArch64/AArch64Subtarget.h index c6eb77e3bc3ba..f5ffc72cae537 100644 --- a/llvm/lib/Target/AArch64/AArch64Subtarget.h +++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h @@ -391,7 +391,7 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { void mirFileLoaded(MachineFunction &MF) const override; // Return the known range for the bit length of SVE data registers. A value - // of 0 means nothing is known about that particular limit beyong what's + // of 0 means nothing is known about that particular limit beyond what's // implied by the architecture. unsigned getMaxSVEVectorSizeInBits() const { assert(isSVEorStreamingSVEAvailable() && @@ -405,6 +405,16 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { return MinSVEVectorSizeInBits; } + // Return the known bit length of SVE data registers. A value of 0 means the + // length is unkown beyond what's implied by the architecture. + unsigned getSVEVectorSizeInBits() const { +assert(isSVEorStreamingSVEAvailable() && + "Tried to get SVE vector length without SVE support!"); +if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits) + return MaxSVEVectorSizeInBits; +return 0; + } + bool useSVEForFixedLengthVectors() const { if (!isSVEorStreamingSVEAvailable()) return false; diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll new file mode 100644 index 0..700bbe4f060ca --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll @@ -0,0 +1,362 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-m
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
rj-jesus wrote: Hi @paulwalker-arm, thanks again for your suggestion. I think the only node missing was `MemIntrinsicSDNode`, which seemingly was considered after `isa(Root)` in the original code (although I'm not sure it was reachable). I've moved it before the main `MemSDNode` path to avoid hitting the unreachable. As far as I could tell, the only `MemIntrinsicSDNode` nodes that the function handles are for `Intrinsic::aarch64_sve_st2`, `st3` and `st4`, so `getMemoryVT()` should be okay to use, I believe. We could also move these intrinsics to the last switch statement and avoid having that dedicated path. I'm not sure what approach is preferable, so I've kept the original code, but please let me know if you'd like me to make that change. Please let me know if you have any other comments! https://github.com/llvm/llvm-project/pull/130625 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/129732 >From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Tue, 4 Mar 2025 02:36:06 -0800 Subject: [PATCH 1/5] Precommit tests --- .../AArch64/sve-fixed-length-offsets.ll | 227 ++ 1 file changed, 227 insertions(+) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll new file mode 100644 index 0..04ace95de3348 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll @@ -0,0 +1,227 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.b +; CHECK-128-NEXT:mov w8, #256 // =0x100 +; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.b +; CHECK-256-NEXT:mov w8, #256 // =0x100 +; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.b +; CHECK-512-NEXT:mov w8, #256 // =0x100 +; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.b +; CHECK-1024-NEXT:mov w8, #256 // =0x100 +; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ptrue p0.b +; CHECK-2048-NEXT:mov w8, #256 // =0x100 +; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.h +; CHECK-128-NEXT:mov x8, #128 // =0x80 +; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.h +; CHECK-256-NEXT:mov x8, #128 // =0x80 +; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.h +; CHECK-512-NEXT:mov x8, #128 // =0x80 +; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.h +; CHECK-1024-NEXT:mov x8, #128 // =0x80 +; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-1024-NEXT:
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/129732 >From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Tue, 4 Mar 2025 02:36:06 -0800 Subject: [PATCH 1/7] Precommit tests --- .../AArch64/sve-fixed-length-offsets.ll | 227 ++ 1 file changed, 227 insertions(+) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll new file mode 100644 index 0..04ace95de3348 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll @@ -0,0 +1,227 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.b +; CHECK-128-NEXT:mov w8, #256 // =0x100 +; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.b +; CHECK-256-NEXT:mov w8, #256 // =0x100 +; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.b +; CHECK-512-NEXT:mov w8, #256 // =0x100 +; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.b +; CHECK-1024-NEXT:mov w8, #256 // =0x100 +; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ptrue p0.b +; CHECK-2048-NEXT:mov w8, #256 // =0x100 +; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.h +; CHECK-128-NEXT:mov x8, #128 // =0x80 +; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.h +; CHECK-256-NEXT:mov x8, #128 // =0x80 +; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.h +; CHECK-512-NEXT:mov x8, #128 // =0x80 +; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.h +; CHECK-1024-NEXT:mov x8, #128 // =0x80 +; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-1024-NEXT:
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/129732 >From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Tue, 4 Mar 2025 02:36:06 -0800 Subject: [PATCH 1/4] Precommit tests --- .../AArch64/sve-fixed-length-offsets.ll | 227 ++ 1 file changed, 227 insertions(+) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll new file mode 100644 index 0..04ace95de3348 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll @@ -0,0 +1,227 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.b +; CHECK-128-NEXT:mov w8, #256 // =0x100 +; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.b +; CHECK-256-NEXT:mov w8, #256 // =0x100 +; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.b +; CHECK-512-NEXT:mov w8, #256 // =0x100 +; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.b +; CHECK-1024-NEXT:mov w8, #256 // =0x100 +; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ptrue p0.b +; CHECK-2048-NEXT:mov w8, #256 // =0x100 +; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.h +; CHECK-128-NEXT:mov x8, #128 // =0x80 +; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.h +; CHECK-256-NEXT:mov x8, #128 // =0x80 +; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.h +; CHECK-512-NEXT:mov x8, #128 // =0x80 +; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.h +; CHECK-1024-NEXT:mov x8, #128 // =0x80 +; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-1024-NEXT:
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -7380,17 +7380,31 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + std::optional MulImm; + if (VScale.getOpcode() == ISD::VSCALE) { +MulImm = cast(VScale.getOperand(0))->getSExtValue(); + } else if (auto C = dyn_cast(VScale)) { +int64_t ByteOffset = C->getSExtValue(); +constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock; +auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock; +auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock; + +if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0) rj-jesus wrote: Thanks - I've added this. Please let me know if that's what you had in mind. :) https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -0,0 +1,362 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl] +; CHECK-2048-NEXT:str z0, [x1, #1, mul vl] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv8i16: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl] +; CHECK-2048-NEXT:str z0, [x1, #1, mul vl] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128 + %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128 + %x = load , ptr %ldoff, align 2 + store %x, ptr %stoff, align 2 + ret void +} + +define void @nxv4i32(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv4i32: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.s +; CHECK-NEXT:mov x8, #64 // =0x40 +; CHECK-NEXT:ld1w { z0.s }, p0/z, [x0, x8, lsl #2] +; CHECK-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv4i32: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv4i32: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv4i32: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv4i32: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv4i32: +; CHECK-2048: // %bb.0: +; CHECK-
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -0,0 +1,362 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl] +; CHECK-2048-NEXT:str z0, [x1, #1, mul vl] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv8i16: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl] +; CHECK-2048-NEXT:str z0, [x1, #1, mul vl] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128 + %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128 + %x = load , ptr %ldoff, align 2 + store %x, ptr %stoff, align 2 + ret void +} + +define void @nxv4i32(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv4i32: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.s +; CHECK-NEXT:mov x8, #64 // =0x40 +; CHECK-NEXT:ld1w { z0.s }, p0/z, [x0, x8, lsl #2] +; CHECK-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv4i32: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl] +; CHECK-128-NEXT:str z0, [x1, #16, mul vl] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv4i32: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl] +; CHECK-256-NEXT:str z0, [x1, #8, mul vl] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv4i32: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl] +; CHECK-512-NEXT:str z0, [x1, #4, mul vl] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv4i32: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl] +; CHECK-1024-NEXT:str z0, [x1, #2, mul vl] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv4i32: +; CHECK-2048: // %bb.0: +; CHECK-
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -7380,17 +7380,31 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + std::optional MulImm; + if (VScale.getOpcode() == ISD::VSCALE) { +MulImm = cast(VScale.getOperand(0))->getSExtValue(); + } else if (auto C = dyn_cast(VScale)) { +int64_t ByteOffset = C->getSExtValue(); +constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock; +auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock; +auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock; + +if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0) rj-jesus wrote: Thanks, that's a good idea. Would you prefer `optional`, or should we follow the logic that `getMinSVEVectorSizeInBits` and `getMaxSVEVectorSizeInBits` already use and return 0 when nothing is known about the limit? https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -405,6 +405,17 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { return MinSVEVectorSizeInBits; } + // Return the known bit length of SVE data registers. A value of 0 means the + // length is unkown beyond what's implied by the architecture. + unsigned getSVEVectorSizeInBits() const { +assert(isSVEorStreamingSVEAvailable() && + "Tried to get SVE vector length without SVE support!"); +if (MaxSVEVectorSizeInBits && +MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits) rj-jesus wrote: Thanks very much for the suggestion; that looks much better. Should we let through the case `!MinSVEVectorSizeInBits && MaxSVEVectorSizeInBits == 128` too? https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -7380,12 +7380,26 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + int64_t MulImm = std::numeric_limits::max(); + if (VScale.getOpcode() == ISD::VSCALE) { +MulImm = cast(VScale.getOperand(0))->getSExtValue(); + } else if (auto C = dyn_cast(VScale)) { +int64_t ByteOffset = C->getSExtValue(); +const auto KnownVScale = +Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock; + +if (!KnownVScale || ByteOffset % KnownVScale != 0) + return false; + +MulImm = ByteOffset / KnownVScale; + } else return false; + assert(MulImm != std::numeric_limits::max() && + "Uninitialized MulImm."); + rj-jesus wrote: Sounds good to me, removed. https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -405,6 +405,17 @@ class AArch64Subtarget final : public AArch64GenSubtargetInfo { return MinSVEVectorSizeInBits; } + // Return the known bit length of SVE data registers. A value of 0 means the + // length is unkown beyond what's implied by the architecture. + unsigned getSVEVectorSizeInBits() const { +assert(isSVEorStreamingSVEAvailable() && + "Tried to get SVE vector length without SVE support!"); +if (MaxSVEVectorSizeInBits && +MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits) rj-jesus wrote: I've left it as is for now due to the lack of a motivating example and to keep it consistent with `getMinSVEVectorSizeInBits`/`getMinSVEVectorSizeInBits`, which I suppose could return 128/2048 as the architecture bounds and avoid this problem altogether. Please let me know if you'd rather I add it. I'll let the tests run and assuming they are OK merge it. https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
@@ -7380,12 +7380,27 @@ bool AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N, return false; SDValue VScale = N.getOperand(1); - if (VScale.getOpcode() != ISD::VSCALE) + int64_t MulImm = std::numeric_limits::max(); + if (VScale.getOpcode() == ISD::VSCALE) rj-jesus wrote: Thanks, done. https://github.com/llvm/llvm-project/pull/129732 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/129732 >From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Tue, 4 Mar 2025 02:36:06 -0800 Subject: [PATCH 1/8] Precommit tests --- .../AArch64/sve-fixed-length-offsets.ll | 227 ++ 1 file changed, 227 insertions(+) create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll new file mode 100644 index 0..04ace95de3348 --- /dev/null +++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll @@ -0,0 +1,227 @@ +; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | FileCheck %s --check-prefix=CHECK-128 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | FileCheck %s --check-prefix=CHECK-256 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | FileCheck %s --check-prefix=CHECK-512 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | FileCheck %s --check-prefix=CHECK-1024 +; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve -aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | FileCheck %s --check-prefix=CHECK-2048 + +define void @nxv16i8(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv16i8: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.b +; CHECK-NEXT:mov w8, #256 // =0x100 +; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv16i8: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.b +; CHECK-128-NEXT:mov w8, #256 // =0x100 +; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv16i8: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.b +; CHECK-256-NEXT:mov w8, #256 // =0x100 +; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv16i8: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.b +; CHECK-512-NEXT:mov w8, #256 // =0x100 +; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv16i8: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.b +; CHECK-1024-NEXT:mov w8, #256 // =0x100 +; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-1024-NEXT:ret +; +; CHECK-2048-LABEL: nxv16i8: +; CHECK-2048: // %bb.0: +; CHECK-2048-NEXT:ptrue p0.b +; CHECK-2048-NEXT:mov w8, #256 // =0x100 +; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8] +; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8] +; CHECK-2048-NEXT:ret + %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256 + %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256 + %x = load , ptr %ldoff, align 1 + store %x, ptr %stoff, align 1 + ret void +} + +define void @nxv8i16(ptr %ldptr, ptr %stptr) { +; CHECK-LABEL: nxv8i16: +; CHECK: // %bb.0: +; CHECK-NEXT:ptrue p0.h +; CHECK-NEXT:mov x8, #128 // =0x80 +; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-NEXT:ret +; +; CHECK-128-LABEL: nxv8i16: +; CHECK-128: // %bb.0: +; CHECK-128-NEXT:ptrue p0.h +; CHECK-128-NEXT:mov x8, #128 // =0x80 +; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-128-NEXT:ret +; +; CHECK-256-LABEL: nxv8i16: +; CHECK-256: // %bb.0: +; CHECK-256-NEXT:ptrue p0.h +; CHECK-256-NEXT:mov x8, #128 // =0x80 +; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-256-NEXT:ret +; +; CHECK-512-LABEL: nxv8i16: +; CHECK-512: // %bb.0: +; CHECK-512-NEXT:ptrue p0.h +; CHECK-512-NEXT:mov x8, #128 // =0x80 +; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-512-NEXT:ret +; +; CHECK-1024-LABEL: nxv8i16: +; CHECK-1024: // %bb.0: +; CHECK-1024-NEXT:ptrue p0.h +; CHECK-1024-NEXT:mov x8, #128 // =0x80 +; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1] +; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1] +; CHECK-1024-NEXT:
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
rj-jesus wrote: Thank you very much for the explanation, @paulwalker-arm - that makes a lot of sense! I'll try your suggestion tomorrow. I'll let you know how it goes. :) https://github.com/llvm/llvm-project/pull/130625 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)
https://github.com/rj-jesus closed https://github.com/llvm/llvm-project/pull/130625 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
rj-jesus wrote: Hi, Olympus is the core in the NVIDIA Vera CPU announced at GTC. https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
@@ -872,6 +883,16 @@ def ProcessorFeatures { list Carmel = [HasV8_2aOps, FeatureNEON, FeatureSHA2, FeatureAES, FeatureFullFP16, FeatureCRC, FeatureLSE, FeatureRAS, FeatureRDM, FeatureFPARMv8]; + list Olympus = [HasV9_2aOps, FeatureBRBE, FeatureCCIDX, +FeatureCHK, FeatureCrypto, FeatureETE, rj-jesus wrote: Thanks, done. https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
rj-jesus wrote: Thanks very much :) Do you want me to wait for a review from @jthackray too since you added him as a reviewer? https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
rj-jesus wrote: Thank you very much, and sorry, I didn't want to sound like I was rushing. I just wasn't sure if I should wait or not, so I thought I'd check. I hope everything goes well! https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
https://github.com/rj-jesus created https://github.com/llvm/llvm-project/pull/132368 This patch adds support for the NVIDIA Olympus core. This does not add any special tuning decisions, and those may come later. >From b9725e115876f26311edd408b9d4521ae8a03ebd Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Wed, 4 Dec 2024 05:42:38 -0800 Subject: [PATCH] [AArch64] Add initial support for -mcpu=olympus. This patch adds support for the NVIDIA Olympus core. This does not add any special tuning decisions, and those may come later. --- clang/test/Driver/aarch64-nvidia-olympus.c| 13 +++ .../aarch64-olympus.c | 82 +++ .../Misc/target-invalid-cpu-note/aarch64.c| 1 + llvm/lib/Target/AArch64/AArch64Processors.td | 25 ++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp | 9 ++ llvm/lib/TargetParser/Host.cpp| 1 + llvm/test/CodeGen/AArch64/cpus.ll | 1 + llvm/unittests/TargetParser/Host.cpp | 4 + .../TargetParser/TargetParserTest.cpp | 3 +- 9 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 clang/test/Driver/aarch64-nvidia-olympus.c create mode 100644 clang/test/Driver/print-enabled-extensions/aarch64-olympus.c diff --git a/clang/test/Driver/aarch64-nvidia-olympus.c b/clang/test/Driver/aarch64-nvidia-olympus.c new file mode 100644 index 0..e832d06917a25 --- /dev/null +++ b/clang/test/Driver/aarch64-nvidia-olympus.c @@ -0,0 +1,13 @@ +// RUN: %clang --target=aarch64 -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus %s +// RUN: %clang --target=aarch64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus %s +// RUN: %clang --target=aarch64 -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus-TUNE %s +// RUN: %clang --target=aarch64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus-TUNE %s +// olympus: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "olympus" +// olympus-TUNE: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" + +// RUN: %clang --target=arm64 -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus %s +// RUN: %clang --target=arm64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus %s +// RUN: %clang --target=arm64 -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus-TUNE %s +// RUN: %clang --target=arm64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus-TUNE %s +// ARM64-olympus: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "olympus" +// ARM64-olympus-TUNE: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "generic" diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c new file mode 100644 index 0..a37ec4ac6aa7d --- /dev/null +++ b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c @@ -0,0 +1,82 @@ +// REQUIRES: aarch64-registered-target +// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=olympus | FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s + +// CHECK: Extensions enabled for the given AArch64 target +// CHECK-EMPTY: +// CHECK-NEXT: Architecture Feature(s) Description +// CHECK-NEXT: FEAT_AES, FEAT_PMULL Enable AES support +// CHECK-NEXT: FEAT_AMUv1 Enable Armv8.4-A Activity Monitors extension +// CHECK-NEXT: FEAT_AMUv1p1 Enable Armv8.6-A Activity Monitors Virtualization support +// CHECK-NEXT: FEAT_AdvSIMD Enable Advanced SIMD instructions +// CHECK-NEXT: FEAT_BF16 Enable BFloat16 Extension +// CHECK-NEXT: FEAT_BRBE Enable Branch Record Buffer Extension +// CHECK-NEXT: FEAT_BTI Enable Branch Target Identification +// CHECK-NEXT: FEAT_CCIDX Enable Armv8.3-A Extend of the CCSIDR number of sets +// CHECK-NEXT: FEAT_CHK Enable Armv8.0-A Check Feature Status Extension +// CHECK-NEXT: FEAT_CRC32 Enable Armv8.0-A CRC-32 checksum instructions +// CHECK-NEXT: FEAT_CSV2_2 Enable architectural speculation restriction +// CHECK-NEXT: FEAT_Crypto Enable cryptographic instructions +// CHECK-NEXT: FEAT_DIT Enable Armv8.4-A Data Independent Timing instructions +// CHECK-NEXT: FEAT_DPB Enable Armv8.2-A data Cache Clean
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
https://github.com/rj-jesus closed https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
@@ -288,6 +288,7 @@ StringRef sys::detail::getHostCPUNameForARM(StringRef ProcCpuinfoContent) { if (Implementer == "0x4e") { // NVIDIA Corporation return StringSwitch(Part) .Case("0x004", "carmel") +.Case("0x10", "olympus") rj-jesus wrote: Thanks for pointing this out, I've added "0x010" too. https://github.com/llvm/llvm-project/pull/132368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)
https://github.com/rj-jesus updated https://github.com/llvm/llvm-project/pull/132368 >From b9725e115876f26311edd408b9d4521ae8a03ebd Mon Sep 17 00:00:00 2001 From: Ricardo Jesus Date: Wed, 4 Dec 2024 05:42:38 -0800 Subject: [PATCH 1/2] [AArch64] Add initial support for -mcpu=olympus. This patch adds support for the NVIDIA Olympus core. This does not add any special tuning decisions, and those may come later. --- clang/test/Driver/aarch64-nvidia-olympus.c| 13 +++ .../aarch64-olympus.c | 82 +++ .../Misc/target-invalid-cpu-note/aarch64.c| 1 + llvm/lib/Target/AArch64/AArch64Processors.td | 25 ++ llvm/lib/Target/AArch64/AArch64Subtarget.cpp | 9 ++ llvm/lib/TargetParser/Host.cpp| 1 + llvm/test/CodeGen/AArch64/cpus.ll | 1 + llvm/unittests/TargetParser/Host.cpp | 4 + .../TargetParser/TargetParserTest.cpp | 3 +- 9 files changed, 138 insertions(+), 1 deletion(-) create mode 100644 clang/test/Driver/aarch64-nvidia-olympus.c create mode 100644 clang/test/Driver/print-enabled-extensions/aarch64-olympus.c diff --git a/clang/test/Driver/aarch64-nvidia-olympus.c b/clang/test/Driver/aarch64-nvidia-olympus.c new file mode 100644 index 0..e832d06917a25 --- /dev/null +++ b/clang/test/Driver/aarch64-nvidia-olympus.c @@ -0,0 +1,13 @@ +// RUN: %clang --target=aarch64 -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus %s +// RUN: %clang --target=aarch64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus %s +// RUN: %clang --target=aarch64 -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus-TUNE %s +// RUN: %clang --target=aarch64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=olympus-TUNE %s +// olympus: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "olympus" +// olympus-TUNE: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic" + +// RUN: %clang --target=arm64 -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus %s +// RUN: %clang --target=arm64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus %s +// RUN: %clang --target=arm64 -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus-TUNE %s +// RUN: %clang --target=arm64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-olympus-TUNE %s +// ARM64-olympus: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "olympus" +// ARM64-olympus-TUNE: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "generic" diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c new file mode 100644 index 0..a37ec4ac6aa7d --- /dev/null +++ b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c @@ -0,0 +1,82 @@ +// REQUIRES: aarch64-registered-target +// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=olympus | FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s + +// CHECK: Extensions enabled for the given AArch64 target +// CHECK-EMPTY: +// CHECK-NEXT: Architecture Feature(s) Description +// CHECK-NEXT: FEAT_AES, FEAT_PMULL Enable AES support +// CHECK-NEXT: FEAT_AMUv1 Enable Armv8.4-A Activity Monitors extension +// CHECK-NEXT: FEAT_AMUv1p1 Enable Armv8.6-A Activity Monitors Virtualization support +// CHECK-NEXT: FEAT_AdvSIMD Enable Advanced SIMD instructions +// CHECK-NEXT: FEAT_BF16 Enable BFloat16 Extension +// CHECK-NEXT: FEAT_BRBE Enable Branch Record Buffer Extension +// CHECK-NEXT: FEAT_BTI Enable Branch Target Identification +// CHECK-NEXT: FEAT_CCIDX Enable Armv8.3-A Extend of the CCSIDR number of sets +// CHECK-NEXT: FEAT_CHK Enable Armv8.0-A Check Feature Status Extension +// CHECK-NEXT: FEAT_CRC32 Enable Armv8.0-A CRC-32 checksum instructions +// CHECK-NEXT: FEAT_CSV2_2 Enable architectural speculation restriction +// CHECK-NEXT: FEAT_Crypto Enable cryptographic instructions +// CHECK-NEXT: FEAT_DIT Enable Armv8.4-A Data Independent Timing instructions +// CHECK-NEXT: FEAT_DPB Enable Armv8.2-A data Cache Clean to Point of Persistence +// CHECK-NEXT: FEAT_DPB2 Enable Armv8.5-A Cache Cl
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Grace (PR #133054)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
https://github.com/rj-jesus edited https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
@@ -555,7 +555,8 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", "ARMProcFamily", "NeoverseV2 FeatureEnableSelectOptimize, FeatureUseFixedOverScalableIfEqualCost, FeatureAvoidLDAPUR, - FeaturePredictableSelectIsExpensive]>; + FeaturePredictableSelectIsExpensive, + FeatureFPAC]>; rj-jesus wrote: Is there any reason you placed this in tuning instead of the main processor features of the Neoverse V2? https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
@@ -19,6 +19,7 @@ // CHECK-NEXT: FEAT_ETE Enable Embedded Trace Extension // CHECK-NEXT: FEAT_FCMA Enable Armv8.3-A Floating-point complex number support // CHECK-NEXT: FEAT_FHM Enable FP16 FML instructions +// CHECK-NEXT: FEAT_FPAC Enable Armv8.3-A Pointer Authentication Faulting enhancement rj-jesus wrote: I think this should be below `FEAT_FP16` (otherwise the test will probably fail to match). https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
https://github.com/rj-jesus approved this pull request. Except for a seemingly out-of-order test, LGTM! https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Grace (PR #133054)
@@ -1067,7 +1067,8 @@ def ProcessorFeatures { FeatureDotProd, FeatureFPARMv8, FeatureMatMulInt8, FeatureSSBS, FeatureCCIDX, FeatureJS, FeatureLSE, FeatureRAS, FeatureRCPC, FeatureRDM]; - list Grace = !listconcat(NeoverseV2, [FeatureSVE2SM4, FeatureSVEAES, FeatureSVE2SHA3]); + list Grace = !listconcat(NeoverseV2, [FeatureSVE2SM4, FeatureSVEAES, FeatureSVE2SHA3, + FeatureFPAC]); rj-jesus wrote: Hi, does it make sense to move this to the Neoverse V2 definition? https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)
@@ -19,6 +19,7 @@ // CHECK-NEXT: FEAT_ETE Enable Embedded Trace Extension // CHECK-NEXT: FEAT_FCMA Enable Armv8.3-A Floating-point complex number support // CHECK-NEXT: FEAT_FHM Enable FP16 FML instructions +// CHECK-NEXT: FEAT_FPAC Enable Armv8.3-A Pointer Authentication Faulting enhancement rj-jesus wrote: I think the file was missing a new line character at the end of the file (which presumably your editor added automatically when you added the `FEAT_FPAC` line). I think this is a harmless change, but if you want to undo it, I believe `truncate -s -1` should work. https://github.com/llvm/llvm-project/pull/133054 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits