from:"Ricardo Jesus via cfe\-commits"

[clang] [flang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)

2023-12-06 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Chipping into the discussion, since this patch I can also no longer build 
OpenBLAS or PETSc. OpenBLAS for example fails with
```
$ clang -v -O3 -mcpu=native  -DHAVE_C11 -Wall -DF_INTERFACE_GFORT -fPIC 
-DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=72 -DMAX_PARALLEL_NUMBER=1 
-DMAX_STACK_ALLOC=2048 -DNO_AFFINITY -DVERSION="\"0.3.25\"" -DBUILD_SINGLE 
-DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16  
utest/CMakeFiles/openblas_utest.dir/utest_main.c.o 
utest/CMakeFiles/openblas_utest.dir/test_min.c.o 
utest/CMakeFiles/openblas_utest.dir/test_amax.c.o 
utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o 
utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o 
utest/CMakeFiles/openblas_utest.dir/test_rot.c.o 
utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o 
utest/CMakeFiles/openblas_utest.dir/test_swap.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o 
utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o 
utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -o 
utest/openblas_utest  -Wl,-rpath,/.../openblas/build/lib  
lib/libopenblas.so.0.3  -lm
clang version 18.0.0 (g...@github.com:llvm/llvm-project.git 
17feb330aab39c6c0c21ee9b02efb484dfb2261e)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /.../llvm/trunk/bin
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/11
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12
Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda, version 
 "/usr/bin/ld" -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux 
-pie -dynamic-linker /lib/ld-linux-aarch64.so.1 -o utest/openblas_utest 
/lib/aarch64-linux-gnu/Scrt1.o /lib/aarch64-linux-gnu/crti.o 
/usr/lib/gcc/aarch64-linux-gnu/12/crtbeginS.o 
-L/.../llvm/trunk/lib/clang/18/lib/aarch64-unknown-linux-gnu 
-L/usr/lib/gcc/aarch64-linux-gnu/12 -L/lib/aarch64-linux-gnu 
-L/usr/lib/aarch64-linux-gnu -L/lib -L/usr/lib -L/.../llvm/trunk/lib 
utest/CMakeFiles/openblas_utest.dir/utest_main.c.o 
utest/CMakeFiles/openblas_utest.dir/test_min.c.o 
utest/CMakeFiles/openblas_utest.dir/test_amax.c.o 
utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o 
utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o 
utest/CMakeFiles/openblas_utest.dir/test_rot.c.o 
utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o 
utest/CMakeFiles/openblas_utest.dir/test_swap.c.o 
utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o 
utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o 
utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -rpath 
/.../openblas/build/lib lib/libopenblas.so.0.3 -lm -lgcc --as-needed -lgcc_s 
--no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed 
/usr/lib/gcc/aarch64-linux-gnu/12/crtendS.o /lib/aarch64-linux-gnu/crtn.o
/usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to 
`_QQEnvironmentDefaults'
/usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to `_QQmain'
```

https://github.com/llvm/llvm-project/pull/73124
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)

2023-12-07 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

> > Chipping into the discussion, since this patch I can also no longer build 
> > OpenBLAS or PETSc. OpenBLAS for example fails with
> > ```
> > $ clang -v -O3 -mcpu=native  -DHAVE_C11 -Wall -DF_INTERFACE_GFORT -fPIC 
> > -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=72 -DMAX_PARALLEL_NUMBER=1 
> > -DMAX_STACK_ALLOC=2048 -DNO_AFFINITY -DVERSION="\"0.3.25\"" -DBUILD_SINGLE 
> > -DBUILD_DOUBLE -DBUILD_COMPLEX -DBUILD_COMPLEX16  
> > utest/CMakeFiles/openblas_utest.dir/utest_main.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_min.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_amax.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_rot.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_swap.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -o 
> > utest/openblas_utest  -Wl,-rpath,/.../openblas/build/lib  
> > lib/libopenblas.so.0.3  -lm
> > clang version 18.0.0 (g...@github.com:llvm/llvm-project.git 
> > 17feb330aab39c6c0c21ee9b02efb484dfb2261e)
> > Target: aarch64-unknown-linux-gnu
> > Thread model: posix
> > InstalledDir: /.../llvm/trunk/bin
> > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/11
> > Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12
> > Selected GCC installation: /usr/lib/gcc/aarch64-linux-gnu/12
> > Candidate multilib: .;@m64
> > Selected multilib: .;@m64
> > Found CUDA installation: /usr/local/cuda, version 
> >  "/usr/bin/ld" -EL -z relro --hash-style=gnu --eh-frame-hdr -m aarch64linux 
> > -pie -dynamic-linker /lib/ld-linux-aarch64.so.1 -o utest/openblas_utest 
> > /lib/aarch64-linux-gnu/Scrt1.o /lib/aarch64-linux-gnu/crti.o 
> > /usr/lib/gcc/aarch64-linux-gnu/12/crtbeginS.o 
> > -L/.../llvm/trunk/lib/clang/18/lib/aarch64-unknown-linux-gnu 
> > -L/usr/lib/gcc/aarch64-linux-gnu/12 -L/lib/aarch64-linux-gnu 
> > -L/usr/lib/aarch64-linux-gnu -L/lib -L/usr/lib -L/.../llvm/trunk/lib 
> > utest/CMakeFiles/openblas_utest.dir/utest_main.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_min.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_amax.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_ismin.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_rotmg.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_rot.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_axpy.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dsdot.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dnrm2.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_swap.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_dotu.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_potrs.c.o 
> > utest/CMakeFiles/openblas_utest.dir/test_kernel_regress.c.o -rpath 
> > /.../openblas/build/lib lib/libopenblas.so.0.3 -lm -lgcc --as-needed 
> > -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed 
> > /usr/lib/gcc/aarch64-linux-gnu/12/crtendS.o /lib/aarch64-linux-gnu/crtn.o
> > /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to 
> > `_QQEnvironmentDefaults'
> > /usr/bin/ld: lib/libopenblas.so.0.3: undefined reference to `_QQmain'
> > ```
> 
> Thanks for the report! Can you please tell me how OpenBLAS was built? I'm 
> trying to replicate this, but I do not see a reference to `_QQmain` or the 
> likes in the OpenBLAS library that I build on x86.

Hi @mjklemm! This was on an AArch64 box (not that that should make a 
difference) doing something like:
```
git clone -b v0.3.25 https://github.com/OpenMathLib/OpenBLAS.git

cd OpenBLAS
mkdir build && cd build
cmake -G Ninja \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DCMAKE_Fortran_COMPILER=flang-new \
  -DCMAKE_C_FLAGS="-O3 -mcpu=native" \
  -DCMAKE_CXX_FLAGS="-O3 -mcpu=native" \
  -DCMAKE_Fortran_FLAGS="-O3 -mcpu=native" \
  -DBUILD_SHARED_LIBS=ON \
  -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR ..
sed -i 's/-m64//g' build.ninja
sed -i 's/-rdynamic//' build.ninja
cmake --build . -j32
cmake --install .
```
The error shows up when linking a C program with a Fortran shared library, so 
maybe you weren't enabling building shared libraries?

https://github.com/llvm/llvm-project/pull/73124
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[flang] [clang] [flang][Driver] Let the linker fail on multiple definitions of main() (PR #73124)

2023-12-08 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

> The solution is to add `-fno-fortran-main` to the linker options via 
> `CMAKE_SHARED_LINKER_FLAGS`. This will need PR #74139 land first. But this 
> option will be a good way to control if the flang compiler should attempt 
> linking in the `main` stub from its library.
> 
> It seems like `flang-new` when being used as a linker with `-shared` included 
> Fortran_main in the shared library. This seems wrong to me. The option 
> `-fno-fortran-main` avoids this. I'm pondering if `-shared` is buggy here. It 
> will require a bit more digging on my end to figure that out.

Thanks, sounds like a good workaround to me, though as you say I find strange 
the need to explicitly specify "don't include main" when building a library!

https://github.com/llvm/llvm-project/pull/73124
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)

2025-02-13 Thread Ricardo Jesus via cfe-commits





rj-jesus wrote:

Sounds good, thanks! :)

https://github.com/llvm/llvm-project/pull/126945
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)

2025-02-12 Thread Ricardo Jesus via cfe-commits





rj-jesus wrote:

Should this be given a more general name, now that it also includes Neon types?

There are also a few comments right at the start that could be extended for 
Neon.

https://github.com/llvm/llvm-project/pull/126945
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-21 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Hi, Olympus is the core in the NVIDIA Vera CPU announced at GTC.

https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits



@@ -872,6 +883,16 @@ def ProcessorFeatures {
   list Carmel   = [HasV8_2aOps, FeatureNEON, FeatureSHA2, 
FeatureAES,
  FeatureFullFP16, FeatureCRC, FeatureLSE, 
FeatureRAS, FeatureRDM,
  FeatureFPARMv8];
+  list Olympus = [HasV9_2aOps, FeatureBRBE, FeatureCCIDX,
+FeatureCHK, FeatureCrypto, FeatureETE,

rj-jesus wrote:

Thanks, done.

https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Thanks very much :) Do you want me to wait for a review from @jthackray too 
since you added him as a reviewer?

https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Thank you very much, and sorry, I didn't want to sound like I was rushing. I 
just wasn't sure if I should wait or not, so I thought I'd check. I hope 
everything goes well!

https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus created 
https://github.com/llvm/llvm-project/pull/132368

This patch adds support for the NVIDIA Olympus core.

This does not add any special tuning decisions, and those may come later.

>From b9725e115876f26311edd408b9d4521ae8a03ebd Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Wed, 4 Dec 2024 05:42:38 -0800
Subject: [PATCH] [AArch64] Add initial support for -mcpu=olympus.

This patch adds support for the NVIDIA Olympus core.

This does not add any special tuning decisions, and those may come
later.
---
 clang/test/Driver/aarch64-nvidia-olympus.c| 13 +++
 .../aarch64-olympus.c | 82 +++
 .../Misc/target-invalid-cpu-note/aarch64.c|  1 +
 llvm/lib/Target/AArch64/AArch64Processors.td  | 25 ++
 llvm/lib/Target/AArch64/AArch64Subtarget.cpp  |  9 ++
 llvm/lib/TargetParser/Host.cpp|  1 +
 llvm/test/CodeGen/AArch64/cpus.ll |  1 +
 llvm/unittests/TargetParser/Host.cpp  |  4 +
 .../TargetParser/TargetParserTest.cpp |  3 +-
 9 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Driver/aarch64-nvidia-olympus.c
 create mode 100644 clang/test/Driver/print-enabled-extensions/aarch64-olympus.c

diff --git a/clang/test/Driver/aarch64-nvidia-olympus.c 
b/clang/test/Driver/aarch64-nvidia-olympus.c
new file mode 100644
index 0..e832d06917a25
--- /dev/null
+++ b/clang/test/Driver/aarch64-nvidia-olympus.c
@@ -0,0 +1,13 @@
+// RUN: %clang --target=aarch64 -mcpu=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=olympus %s
+// RUN: %clang --target=aarch64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 
| FileCheck -check-prefix=olympus %s
+// RUN: %clang --target=aarch64 -mtune=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=olympus-TUNE %s
+// RUN: %clang --target=aarch64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 
| FileCheck -check-prefix=olympus-TUNE %s
+// olympus: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "olympus"
+// olympus-TUNE: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic"
+
+// RUN: %clang --target=arm64 -mcpu=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=ARM64-olympus %s
+// RUN: %clang --target=arm64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | 
FileCheck -check-prefix=ARM64-olympus %s
+// RUN: %clang --target=arm64 -mtune=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=ARM64-olympus-TUNE %s
+// RUN: %clang --target=arm64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | 
FileCheck -check-prefix=ARM64-olympus-TUNE %s
+// ARM64-olympus: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "olympus"
+// ARM64-olympus-TUNE: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" 
"generic"
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c 
b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c
new file mode 100644
index 0..a37ec4ac6aa7d
--- /dev/null
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c
@@ -0,0 +1,82 @@
+// REQUIRES: aarch64-registered-target
+// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=olympus | 
FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s
+
+// CHECK: Extensions enabled for the given AArch64 target
+// CHECK-EMPTY:
+// CHECK-NEXT: Architecture Feature(s)
Description
+// CHECK-NEXT: FEAT_AES, FEAT_PMULL   
Enable AES support
+// CHECK-NEXT: FEAT_AMUv1 
Enable Armv8.4-A Activity Monitors extension
+// CHECK-NEXT: FEAT_AMUv1p1   
Enable Armv8.6-A Activity Monitors Virtualization support
+// CHECK-NEXT: FEAT_AdvSIMD   
Enable Advanced SIMD instructions
+// CHECK-NEXT: FEAT_BF16  
Enable BFloat16 Extension
+// CHECK-NEXT: FEAT_BRBE  
Enable Branch Record Buffer Extension
+// CHECK-NEXT: FEAT_BTI   
Enable Branch Target Identification
+// CHECK-NEXT: FEAT_CCIDX 
Enable Armv8.3-A Extend of the CCSIDR number of sets
+// CHECK-NEXT: FEAT_CHK   
Enable Armv8.0-A Check Feature Status Extension
+// CHECK-NEXT: FEAT_CRC32 
Enable Armv8.0-A CRC-32 checksum instructions
+// CHECK-NEXT: FEAT_CSV2_2
Enable architectural speculation restriction
+// CHECK-NEXT: FEAT_Crypto
Enable cryptographic instructions
+// CHECK-NEXT: FEAT_DIT   
Enable Armv8.4-A Data Independent Timing instructions
+// CHECK-NEXT: FEAT_DPB   
Enable Armv8.2-A data Cache Clean

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-25 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-19 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/130625
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-04-05 Thread Ricardo Jesus via cfe-commits



@@ -19,6 +19,7 @@
 // CHECK-NEXT: FEAT_ETE   
Enable Embedded Trace Extension
 // CHECK-NEXT: FEAT_FCMA  
Enable Armv8.3-A Floating-point complex number support
 // CHECK-NEXT: FEAT_FHM   
Enable FP16 FML instructions
+// CHECK-NEXT: FEAT_FPAC  
Enable Armv8.3-A Pointer Authentication Faulting enhancement

rj-jesus wrote:

I think the file was missing a new line character at the end of the file (which 
presumably your editor added automatically when you added the `FEAT_FPAC` 
line). I think this is a harmless change, but if you want to undo it, I believe 
`truncate -s -1` should work.

https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Grace (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits



@@ -555,7 +555,8 @@ def TuneNeoverseV2 : SubtargetFeature<"neoversev2", 
"ARMProcFamily", "NeoverseV2
   FeatureEnableSelectOptimize,
   FeatureUseFixedOverScalableIfEqualCost,
   FeatureAvoidLDAPUR,
-  FeaturePredictableSelectIsExpensive]>;
+  FeaturePredictableSelectIsExpensive,
+  FeatureFPAC]>;

rj-jesus wrote:

Is there any reason you placed this in tuning instead of the main processor 
features of the Neoverse V2?

https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits



@@ -19,6 +19,7 @@
 // CHECK-NEXT: FEAT_ETE   
Enable Embedded Trace Extension
 // CHECK-NEXT: FEAT_FCMA  
Enable Armv8.3-A Floating-point complex number support
 // CHECK-NEXT: FEAT_FHM   
Enable FP16 FML instructions
+// CHECK-NEXT: FEAT_FPAC  
Enable Armv8.3-A Pointer Authentication Faulting enhancement

rj-jesus wrote:

I think this should be below `FEAT_FP16` (otherwise the test will probably fail 
to match).

https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Neoverse V2 (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus approved this pull request.

Except for a seemingly out-of-order test, LGTM!

https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add FEAT_FPAC to Grace (PR #133054)

2025-03-26 Thread Ricardo Jesus via cfe-commits



@@ -1067,7 +1067,8 @@ def ProcessorFeatures {
  FeatureDotProd, FeatureFPARMv8, 
FeatureMatMulInt8,
  FeatureSSBS, FeatureCCIDX,
  FeatureJS, FeatureLSE, FeatureRAS, 
FeatureRCPC, FeatureRDM];
-  list Grace = !listconcat(NeoverseV2, [FeatureSVE2SM4, 
FeatureSVEAES, FeatureSVE2SHA3]);
+  list Grace = !listconcat(NeoverseV2, [FeatureSVE2SM4, 
FeatureSVEAES, FeatureSVE2SHA3,
+  FeatureFPAC]);

rj-jesus wrote:

Hi, does it make sense to move this to the Neoverse V2 definition?

https://github.com/llvm/llvm-project/pull/133054
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (PR #130973)

2025-04-28 Thread Ricardo Jesus via cfe-commits



@@ -554,6 +554,22 @@ class VectorType : public Type {
 return VectorType::get(VTy->getElementType(), EltCnt * 2);
   }
 
+  /// This static method returns a VectorType with the same size-in-bits as
+  /// SizeTy but with an element type that matches the scalar type of EltTy.
+  static VectorType *getWithSizeAndScalar(VectorType *SizeTy, Type *EltTy) {

rj-jesus wrote:

What do you think of `getWithPreservedSize(Type *ElementType, const VectorType 
*Other)`? Similar to `VectorType::get`, but preserving the bit-size of `Other` 
instead of its element count.

https://github.com/llvm/llvm-project/pull/130973
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (PR #130973)

2025-04-28 Thread Ricardo Jesus via cfe-commits



@@ -1168,6 +1168,15 @@ bool Function::nullPointerIsDefined() const {
   return hasFnAttribute(Attribute::NullPointerIsValid);
 }
 
+unsigned Function::getVScaleValue() const {
+  Attribute Attr = getFnAttribute(Attribute::VScaleRange);
+  if (!Attr.isValid())
+return 0;
+
+  unsigned VScale = Attr.getVScaleRangeMax().value_or(0);
+  return VScale == Attr.getVScaleRangeMin() ? VScale : 0;

rj-jesus wrote:

Nit: maybe you could do `getVScaleRangeMin()` first to avoid the `.value_or(0)` 
(since the former isn't `std::optional`)?

https://github.com/llvm/llvm-project/pull/130973
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (PR #130973)

2025-04-28 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus commented:

This looks good as far as I can see, but I don't feel qualified to approve it.
FWIW it fixes a large regression on GROMACS with LLVM 20.

https://github.com/llvm/llvm-project/pull/130973
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. (PR #130973)

2025-04-28 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/130973
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)

2025-03-07 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

I'll commit this to get the bot back to green while I look into it offline.

https://github.com/llvm/llvm-project/pull/130263
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)

2025-03-07 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/130263
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Revert "[AArch64][SVE] Improve fixed-length addressing modes." (PR #130263)

2025-03-07 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus created 
https://github.com/llvm/llvm-project/pull/130263

Reverts llvm/llvm-project#129732.

I'll look into what's causing the buildbot reported in 
https://github.com/llvm/llvm-project/pull/129732#issuecomment-2705062636 to 
fail offline.

>From 5a71fab0067bae0f532a6268749df71dbe66b4ac Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Fri, 7 Mar 2025 09:16:20 +
Subject: [PATCH] Revert "[AArch64][SVE] Improve fixed-length addressing modes.
 (#129732)"

This reverts commit f01e760c08365426de95f02dc2c2dc670eb47352.
---
 .../CodeGen/AArch64/sve-vector-bits-codegen.c |   9 +-
 .../Target/AArch64/AArch64ISelDAGToDAG.cpp|  15 +-
 llvm/lib/Target/AArch64/AArch64Subtarget.h|  12 +-
 .../AArch64/sve-fixed-length-offsets.ll   | 362 --
 .../AArch64/sve-fixed-length-shuffles.ll  |  90 ++---
 5 files changed, 54 insertions(+), 434 deletions(-)
 delete mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c 
b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 1391a1b09fbd1..0ed14b4b3b793 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,9 +13,12 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-8: str
-// CHECK512-COUNT-4: str
-// CHECK1024-COUNT-2: str
+// CHECK256-COUNT-1: str
+// CHECK256-COUNT-7: st1w
+// CHECK512-COUNT-1: str
+// CHECK512-COUNT-3: st1w
+// CHECK1024-COUNT-1: str
+// CHECK1024-COUNT-1: st1w
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 07bcd802962fa..3ca9107cb2ce5 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,23 +7380,12 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  int64_t MulImm = std::numeric_limits::max();
-  if (VScale.getOpcode() == ISD::VSCALE) {
-MulImm = cast(VScale.getOperand(0))->getSExtValue();
-  } else if (auto C = dyn_cast(VScale)) {
-int64_t ByteOffset = C->getSExtValue();
-const auto KnownVScale =
-Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
-
-if (!KnownVScale || ByteOffset % KnownVScale != 0)
-  return false;
-
-MulImm = ByteOffset / KnownVScale;
-  } else
+  if (VScale.getOpcode() != ISD::VSCALE)
 return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8;
+  int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
 return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h 
b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index f5ffc72cae537..c6eb77e3bc3ba 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyond what's
+  // of 0 means nothing is known about that particular limit beyong what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
 assert(isSVEorStreamingSVEAvailable() &&
@@ -405,16 +405,6 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
 return MinSVEVectorSizeInBits;
   }
 
-  // Return the known bit length of SVE data registers. A value of 0 means the
-  // length is unkown beyond what's implied by the architecture.
-  unsigned getSVEVectorSizeInBits() const {
-assert(isSVEorStreamingSVEAvailable() &&
-   "Tried to get SVE vector length without SVE support!");
-if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
-  return MaxSVEVectorSizeInBits;
-return 0;
-  }
-
   bool useSVEForFixedLengthVectors() const {
 if (!isSVEorStreamingSVEAvailable())
   return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
deleted file mode 100644
index 700bbe4f060ca..0
--- a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
+++ /dev/null
@@ -1,362 +0,0 @@
-; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
-; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
-; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
-; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-08 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-11 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/130625
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-11 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus created 
https://github.com/llvm/llvm-project/pull/130625

This restores commit f01e760c08365426de95f02dc2c2dc670eb47352.

The original patch from #129732 exposed what seems to be a bug in 
`SelectAddrModeIndexedSVE`.

Currently, the offset returned by `SelectAddrModeIndexedSVE` is computed by 
dividing a VL-based offset (`MulImm`) by the known minimum width of `MemVT`. 
This works when `MemVT` is a scalable vector type because scalable types are 
intrinsically VL-based. However, for fixed vector types, `MemVT` is not scaled 
to the SVE vector length, which may seemingly lead to inaccurate results. For 
example, for `vscale * 32`, I expect the offset returned to be `2*VL`, 
irrespective of the width of `MemVT` (unless the latter is an unpacked SVE 
type). VLA codegen seems to agree with this. However, for `<8 x i32>` vectors, 
VLS codegen (which uses `SelectAddrModeIndexedSVE`) returns `1*VL`: 
https://godbolt.org/z/7149fejGo.
Is this intentional?

Although this seems to affect both VSCALE-based and Constant-based offsets, I 
believe we didn't come across it earlier because we don't generate combinations 
of VSCALE offsets + fixed vectors often. Enabling the Constant-based path made 
the problem (assuming _it is_ a problem) obvious because combinations of 
Constant offsets + fixed vectors are common.

To work around the issue temporarily, I added an early exit to the 
Constant-based path for fixed vector types.
This doesn't affect the VSCALE path because I wanted to confirm whether the 
current behaviour is intentional or not.

I think the long-term solution is to set `MemWidthBytes = 16` for fixed 
vectors, which should fix the address calculation for both paths. I'm happy to 
do this here or open a separate PR, but first I wanted to confirm whether this 
is a viable solution (hence why I added a more conservative solution for the 
time being).

What do you think?

>From 03471cbf9270d1707191057de46dd38409c8a046 Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Mon, 10 Mar 2025 01:57:20 -0700
Subject: [PATCH 1/3] Reapply "[AArch64][SVE] Improve fixed-length addressing
 modes." (#130263)

This reverts commit 21610e3ecc8bc727f99047e544186b35b1291bcd.
---
 .../CodeGen/AArch64/sve-vector-bits-codegen.c |   9 +-
 .../Target/AArch64/AArch64ISelDAGToDAG.cpp|  15 +-
 llvm/lib/Target/AArch64/AArch64Subtarget.h|  12 +-
 .../AArch64/sve-fixed-length-offsets.ll   | 362 ++
 .../AArch64/sve-fixed-length-shuffles.ll  |  90 ++---
 5 files changed, 434 insertions(+), 54 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c 
b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..07bcd802962fa 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,23 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+MulImm = cast(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast(VScale)) {
+int64_t ByteOffset = C->getSExtValue();
+const auto KnownVScale =
+Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+if (!KnownVScale || ByteOffset % KnownVScale != 0)
+  return false;
+
+MulImm = ByteOffset / KnownVScale;
+  } else
 return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
 return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h 
b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data regist

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-13 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/130625

>From 03471cbf9270d1707191057de46dd38409c8a046 Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Mon, 10 Mar 2025 01:57:20 -0700
Subject: [PATCH 1/4] Reapply "[AArch64][SVE] Improve fixed-length addressing
 modes." (#130263)

This reverts commit 21610e3ecc8bc727f99047e544186b35b1291bcd.
---
 .../CodeGen/AArch64/sve-vector-bits-codegen.c |   9 +-
 .../Target/AArch64/AArch64ISelDAGToDAG.cpp|  15 +-
 llvm/lib/Target/AArch64/AArch64Subtarget.h|  12 +-
 .../AArch64/sve-fixed-length-offsets.ll   | 362 ++
 .../AArch64/sve-fixed-length-shuffles.ll  |  90 ++---
 5 files changed, 434 insertions(+), 54 deletions(-)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c 
b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 0ed14b4b3b793..1391a1b09fbd1 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -13,12 +13,9 @@
 
 void func(int *restrict a, int *restrict b) {
 // CHECK-LABEL: func
-// CHECK256-COUNT-1: str
-// CHECK256-COUNT-7: st1w
-// CHECK512-COUNT-1: str
-// CHECK512-COUNT-3: st1w
-// CHECK1024-COUNT-1: str
-// CHECK1024-COUNT-1: st1w
+// CHECK256-COUNT-8: str
+// CHECK512-COUNT-4: str
+// CHECK1024-COUNT-2: str
 // CHECK2048-COUNT-1: st1w
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
index 3ca9107cb2ce5..07bcd802962fa 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
@@ -7380,12 +7380,23 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+MulImm = cast(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast(VScale)) {
+int64_t ByteOffset = C->getSExtValue();
+const auto KnownVScale =
+Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+if (!KnownVScale || ByteOffset % KnownVScale != 0)
+  return false;
+
+MulImm = ByteOffset / KnownVScale;
+  } else
 return false;
 
   TypeSize TS = MemVT.getSizeInBits();
   int64_t MemWidthBytes = static_cast(TS.getKnownMinValue()) / 8;
-  int64_t MulImm = cast(VScale.getOperand(0))->getSExtValue();
 
   if ((MulImm % MemWidthBytes) != 0)
 return false;
diff --git a/llvm/lib/Target/AArch64/AArch64Subtarget.h 
b/llvm/lib/Target/AArch64/AArch64Subtarget.h
index c6eb77e3bc3ba..f5ffc72cae537 100644
--- a/llvm/lib/Target/AArch64/AArch64Subtarget.h
+++ b/llvm/lib/Target/AArch64/AArch64Subtarget.h
@@ -391,7 +391,7 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
   void mirFileLoaded(MachineFunction &MF) const override;
 
   // Return the known range for the bit length of SVE data registers. A value
-  // of 0 means nothing is known about that particular limit beyong what's
+  // of 0 means nothing is known about that particular limit beyond what's
   // implied by the architecture.
   unsigned getMaxSVEVectorSizeInBits() const {
 assert(isSVEorStreamingSVEAvailable() &&
@@ -405,6 +405,16 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
 return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+assert(isSVEorStreamingSVEAvailable() &&
+   "Tried to get SVE vector length without SVE support!");
+if (MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)
+  return MaxSVEVectorSizeInBits;
+return 0;
+  }
+
   bool useSVEForFixedLengthVectors() const {
 if (!isSVEorStreamingSVEAvailable())
   return false;
diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0..700bbe4f060ca
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,362 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-m

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-13 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Hi @paulwalker-arm, thanks again for your suggestion. I think the only node 
missing was `MemIntrinsicSDNode`, which seemingly was considered after 
`isa(Root)` in the original code (although I'm not sure it was 
reachable). I've moved it before the main `MemSDNode` path to avoid hitting the 
unreachable.

As far as I could tell, the only `MemIntrinsicSDNode` nodes that the function 
handles are for `Intrinsic::aarch64_sve_st2`, `st3` and `st4`, so 
`getMemoryVT()` should be okay to use, I believe. We could also move these 
intrinsics to the last switch statement and avoid having that dedicated path. 
I'm not sure what approach is preferable, so I've kept the original code, but 
please let me know if you'd like me to make that change.

Please let me know if you have any other comments!

https://github.com/llvm/llvm-project/pull/130625
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/129732

>From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Tue, 4 Mar 2025 02:36:06 -0800
Subject: [PATCH 1/5] Precommit tests

---
 .../AArch64/sve-fixed-length-offsets.ll   | 227 ++
 1 file changed, 227 insertions(+)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0..04ace95de3348
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,227 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.b
+; CHECK-128-NEXT:mov w8, #256 // =0x100
+; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.b
+; CHECK-256-NEXT:mov w8, #256 // =0x100
+; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.b
+; CHECK-512-NEXT:mov w8, #256 // =0x100
+; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.b
+; CHECK-1024-NEXT:mov w8, #256 // =0x100
+; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ptrue p0.b
+; CHECK-2048-NEXT:mov w8, #256 // =0x100
+; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.h
+; CHECK-128-NEXT:mov x8, #128 // =0x80
+; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.h
+; CHECK-256-NEXT:mov x8, #128 // =0x80
+; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.h
+; CHECK-512-NEXT:mov x8, #128 // =0x80
+; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.h
+; CHECK-1024-NEXT:mov x8, #128 // =0x80
+; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-1024-NEXT:

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/129732

>From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Tue, 4 Mar 2025 02:36:06 -0800
Subject: [PATCH 1/7] Precommit tests

---
 .../AArch64/sve-fixed-length-offsets.ll   | 227 ++
 1 file changed, 227 insertions(+)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0..04ace95de3348
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,227 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.b
+; CHECK-128-NEXT:mov w8, #256 // =0x100
+; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.b
+; CHECK-256-NEXT:mov w8, #256 // =0x100
+; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.b
+; CHECK-512-NEXT:mov w8, #256 // =0x100
+; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.b
+; CHECK-1024-NEXT:mov w8, #256 // =0x100
+; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ptrue p0.b
+; CHECK-2048-NEXT:mov w8, #256 // =0x100
+; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.h
+; CHECK-128-NEXT:mov x8, #128 // =0x80
+; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.h
+; CHECK-256-NEXT:mov x8, #128 // =0x80
+; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.h
+; CHECK-512-NEXT:mov x8, #128 // =0x80
+; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.h
+; CHECK-1024-NEXT:mov x8, #128 // =0x80
+; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-1024-NEXT:

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/129732

>From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Tue, 4 Mar 2025 02:36:06 -0800
Subject: [PATCH 1/4] Precommit tests

---
 .../AArch64/sve-fixed-length-offsets.ll   | 227 ++
 1 file changed, 227 insertions(+)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0..04ace95de3348
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,227 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.b
+; CHECK-128-NEXT:mov w8, #256 // =0x100
+; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.b
+; CHECK-256-NEXT:mov w8, #256 // =0x100
+; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.b
+; CHECK-512-NEXT:mov w8, #256 // =0x100
+; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.b
+; CHECK-1024-NEXT:mov w8, #256 // =0x100
+; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ptrue p0.b
+; CHECK-2048-NEXT:mov w8, #256 // =0x100
+; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.h
+; CHECK-128-NEXT:mov x8, #128 // =0x80
+; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.h
+; CHECK-256-NEXT:mov x8, #128 // =0x80
+; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.h
+; CHECK-512-NEXT:mov x8, #128 // =0x80
+; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.h
+; CHECK-1024-NEXT:mov x8, #128 // =0x80
+; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-1024-NEXT:

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -7380,17 +7380,31 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  std::optional MulImm;
+  if (VScale.getOpcode() == ISD::VSCALE) {
+MulImm = cast(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast(VScale)) {
+int64_t ByteOffset = C->getSExtValue();
+constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock;
+auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock;
+auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock;
+
+if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0)

rj-jesus wrote:

Thanks - I've added this. Please let me know if that's what you had in mind. :)


https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -0,0 +1,362 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load , ptr %ldoff, align 2
+  store  %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:mov x8, #64 // =0x40
+; CHECK-NEXT:ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:   // %bb.0:
+; CHECK-

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -0,0 +1,362 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv8i16:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ldr z0, [x0, #1, mul vl]
+; CHECK-2048-NEXT:str z0, [x1, #1, mul vl]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i16, ptr %ldptr, i64 128
+  %stoff = getelementptr inbounds nuw i16, ptr %stptr, i64 128
+  %x = load , ptr %ldoff, align 2
+  store  %x, ptr %stoff, align 2
+  ret void
+}
+
+define void @nxv4i32(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv4i32:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.s
+; CHECK-NEXT:mov x8, #64 // =0x40
+; CHECK-NEXT:ld1w { z0.s }, p0/z, [x0, x8, lsl #2]
+; CHECK-NEXT:st1w { z0.s }, p0, [x1, x8, lsl #2]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv4i32:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ldr z0, [x0, #16, mul vl]
+; CHECK-128-NEXT:str z0, [x1, #16, mul vl]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv4i32:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ldr z0, [x0, #8, mul vl]
+; CHECK-256-NEXT:str z0, [x1, #8, mul vl]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv4i32:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ldr z0, [x0, #4, mul vl]
+; CHECK-512-NEXT:str z0, [x1, #4, mul vl]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv4i32:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ldr z0, [x0, #2, mul vl]
+; CHECK-1024-NEXT:str z0, [x1, #2, mul vl]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv4i32:
+; CHECK-2048:   // %bb.0:
+; CHECK-

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -7380,17 +7380,31 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  std::optional MulImm;
+  if (VScale.getOpcode() == ISD::VSCALE) {
+MulImm = cast(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast(VScale)) {
+int64_t ByteOffset = C->getSExtValue();
+constexpr auto SVEBitsPerBlock = AArch64::SVEBitsPerBlock;
+auto MinVScale = Subtarget->getMinSVEVectorSizeInBits() / SVEBitsPerBlock;
+auto MaxVScale = Subtarget->getMaxSVEVectorSizeInBits() / SVEBitsPerBlock;
+
+if (!MaxVScale || MinVScale != MaxVScale || ByteOffset % MaxVScale != 0)

rj-jesus wrote:

Thanks, that's a good idea. Would you prefer `optional`, or should we follow 
the logic that `getMinSVEVectorSizeInBits` and `getMaxSVEVectorSizeInBits` 
already use and return 0 when nothing is known about the limit?

https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -405,6 +405,17 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
 return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+assert(isSVEorStreamingSVEAvailable() &&
+   "Tried to get SVE vector length without SVE support!");
+if (MaxSVEVectorSizeInBits &&
+MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)

rj-jesus wrote:

Thanks very much for the suggestion; that looks much better.
Should we let through the case `!MinSVEVectorSizeInBits && 
MaxSVEVectorSizeInBits == 128` too?

https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -7380,12 +7380,26 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits::max();
+  if (VScale.getOpcode() == ISD::VSCALE) {
+MulImm = cast(VScale.getOperand(0))->getSExtValue();
+  } else if (auto C = dyn_cast(VScale)) {
+int64_t ByteOffset = C->getSExtValue();
+const auto KnownVScale =
+Subtarget->getSVEVectorSizeInBits() / AArch64::SVEBitsPerBlock;
+
+if (!KnownVScale || ByteOffset % KnownVScale != 0)
+  return false;
+
+MulImm = ByteOffset / KnownVScale;
+  } else
 return false;
 
+  assert(MulImm != std::numeric_limits::max() &&
+ "Uninitialized MulImm.");
+

rj-jesus wrote:

Sounds good to me, removed.

https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -405,6 +405,17 @@ class AArch64Subtarget final : public 
AArch64GenSubtargetInfo {
 return MinSVEVectorSizeInBits;
   }
 
+  // Return the known bit length of SVE data registers. A value of 0 means the
+  // length is unkown beyond what's implied by the architecture.
+  unsigned getSVEVectorSizeInBits() const {
+assert(isSVEorStreamingSVEAvailable() &&
+   "Tried to get SVE vector length without SVE support!");
+if (MaxSVEVectorSizeInBits &&
+MinSVEVectorSizeInBits == MaxSVEVectorSizeInBits)

rj-jesus wrote:

I've left it as is for now due to the lack of a motivating example and to keep 
it consistent with `getMinSVEVectorSizeInBits`/`getMinSVEVectorSizeInBits`, 
which I suppose could return 128/2048 as the architecture bounds and avoid this 
problem altogether. Please let me know if you'd rather I add it.

I'll let the tests run and assuming they are OK merge it.

https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits



@@ -7380,12 +7380,27 @@ bool 
AArch64DAGToDAGISel::SelectAddrModeIndexedSVE(SDNode *Root, SDValue N,
 return false;
 
   SDValue VScale = N.getOperand(1);
-  if (VScale.getOpcode() != ISD::VSCALE)
+  int64_t MulImm = std::numeric_limits::max();
+  if (VScale.getOpcode() == ISD::VSCALE)

rj-jesus wrote:

Thanks, done.

https://github.com/llvm/llvm-project/pull/129732
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Improve fixed-length addressing modes. (PR #129732)

2025-03-05 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/129732

>From 624d1e924aa130eea2a8ddaefaeb587aab642f2f Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Tue, 4 Mar 2025 02:36:06 -0800
Subject: [PATCH 1/8] Precommit tests

---
 .../AArch64/sve-fixed-length-offsets.ll   | 227 ++
 1 file changed, 227 insertions(+)
 create mode 100644 llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll

diff --git a/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll 
b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
new file mode 100644
index 0..04ace95de3348
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sve-fixed-length-offsets.ll
@@ -0,0 +1,227 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py 
UTC_ARGS: --version 5
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=128 -aarch64-sve-vector-bits-max=128 < %s | 
FileCheck %s --check-prefix=CHECK-128
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=256 -aarch64-sve-vector-bits-max=256 < %s | 
FileCheck %s --check-prefix=CHECK-256
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=512 -aarch64-sve-vector-bits-max=512 < %s | 
FileCheck %s --check-prefix=CHECK-512
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=1024 -aarch64-sve-vector-bits-max=1024 < %s | 
FileCheck %s --check-prefix=CHECK-1024
+; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve 
-aarch64-sve-vector-bits-min=2048 -aarch64-sve-vector-bits-max=2048 < %s | 
FileCheck %s --check-prefix=CHECK-2048
+
+define void @nxv16i8(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv16i8:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.b
+; CHECK-NEXT:mov w8, #256 // =0x100
+; CHECK-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv16i8:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.b
+; CHECK-128-NEXT:mov w8, #256 // =0x100
+; CHECK-128-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-128-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv16i8:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.b
+; CHECK-256-NEXT:mov w8, #256 // =0x100
+; CHECK-256-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-256-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv16i8:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.b
+; CHECK-512-NEXT:mov w8, #256 // =0x100
+; CHECK-512-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-512-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv16i8:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.b
+; CHECK-1024-NEXT:mov w8, #256 // =0x100
+; CHECK-1024-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-1024-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-1024-NEXT:ret
+;
+; CHECK-2048-LABEL: nxv16i8:
+; CHECK-2048:   // %bb.0:
+; CHECK-2048-NEXT:ptrue p0.b
+; CHECK-2048-NEXT:mov w8, #256 // =0x100
+; CHECK-2048-NEXT:ld1b { z0.b }, p0/z, [x0, x8]
+; CHECK-2048-NEXT:st1b { z0.b }, p0, [x1, x8]
+; CHECK-2048-NEXT:ret
+  %ldoff = getelementptr inbounds nuw i8, ptr %ldptr, i64 256
+  %stoff = getelementptr inbounds nuw i8, ptr %stptr, i64 256
+  %x = load , ptr %ldoff, align 1
+  store  %x, ptr %stoff, align 1
+  ret void
+}
+
+define void @nxv8i16(ptr %ldptr, ptr %stptr) {
+; CHECK-LABEL: nxv8i16:
+; CHECK:   // %bb.0:
+; CHECK-NEXT:ptrue p0.h
+; CHECK-NEXT:mov x8, #128 // =0x80
+; CHECK-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-NEXT:ret
+;
+; CHECK-128-LABEL: nxv8i16:
+; CHECK-128:   // %bb.0:
+; CHECK-128-NEXT:ptrue p0.h
+; CHECK-128-NEXT:mov x8, #128 // =0x80
+; CHECK-128-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-128-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-128-NEXT:ret
+;
+; CHECK-256-LABEL: nxv8i16:
+; CHECK-256:   // %bb.0:
+; CHECK-256-NEXT:ptrue p0.h
+; CHECK-256-NEXT:mov x8, #128 // =0x80
+; CHECK-256-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-256-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-256-NEXT:ret
+;
+; CHECK-512-LABEL: nxv8i16:
+; CHECK-512:   // %bb.0:
+; CHECK-512-NEXT:ptrue p0.h
+; CHECK-512-NEXT:mov x8, #128 // =0x80
+; CHECK-512-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-512-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-512-NEXT:ret
+;
+; CHECK-1024-LABEL: nxv8i16:
+; CHECK-1024:   // %bb.0:
+; CHECK-1024-NEXT:ptrue p0.h
+; CHECK-1024-NEXT:mov x8, #128 // =0x80
+; CHECK-1024-NEXT:ld1h { z0.h }, p0/z, [x0, x8, lsl #1]
+; CHECK-1024-NEXT:st1h { z0.h }, p0, [x1, x8, lsl #1]
+; CHECK-1024-NEXT:

[clang] [llvm] Reapply "[AArch64][SVE] Improve fixed-length addressing modes. (#130263)" (PR #130625)

2025-03-12 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Thank you very much for the explanation, @paulwalker-arm - that makes a lot of 
sense! I'll try your suggestion tomorrow. I'll let you know how it goes. :) 

https://github.com/llvm/llvm-project/pull/130625
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [ARM][AArch64] Add missing Neon Types (PR #126945)

2025-02-12 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

I believe this fixes #113297, right?

https://github.com/llvm/llvm-project/pull/126945
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-21 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Thank you very much for checking! If you have any other comments please let me 
know.

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-21 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Thanks for the pointer, @davemgreen. You're right, with `+strict-align` this 
has to be 16B aligned.
This is also only valid for LE, but this should already be enforced.

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-20 Thread Ricardo Jesus via cfe-commits


rj-jesus wrote:

Hi @paulwalker-arm, I think the alignment requirements of LD1 and LDR are 
indeed different, but this only matters if `AlignmentEnforced()` is enabled, 
right? I thought `AlignmentEnforced` wasn't generally a concern, otherwise even 
the current lowering we have for `vld1q_u8(uint8_t const *ptr)`, for example, 
seems too permissive (https://godbolt.org/z/coYefno3j):
```cpp
#include 

uint8x16_t foo(uint8_t *ptr) {
  return vld1q_u8(ptr);
}
```
Currently gets lowered to:
```llvm
define <16 x i8> @foo(ptr %0) {
  %2 = load <16 x i8>, ptr %0, align 1
  ret <16 x i8> %2
}
```
Which finally lowers to:
```gas
foo:
ldr q0, [x0]
ret
```
`ptr` isn't necessarily aligned to 16 (in the IR, it's only guaranteed to be 
aligned to 1), but, unless I'm missing something in the docs, LDR.Q also seems 
to expect an alignment of 16 if `AlignmentEnforced` is enabled, and will fault 
if not.

Am I missing anything?

Also, even if we can't indeed lower LD1/ST1 to LDR/STR generally, do you think 
it would be worth trying to do it in some other more restricted way (for 
example only for SP, which I believe should be aligned to 16), or should we 
drop the idea entirely?

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits



@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in {
   defm : unpred_loadstore_bitcast;
   defm : unpred_loadstore_bitcast;
 
+  // Allow using LDR/STR to avoid the predicate dependence.
+  let Predicates = [IsLE, AllowMisalignedMemAccesses] in
+foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, 
nxv2f64, nxv8bf16 ] in {
+  let AddedComplexity = 2 in {
+def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))),
+  (LDR_ZXI GPR64sp:$base, simm9:$offset)>;
+def : Pat<(store Ty:$val, (am_sve_indexed_s9 GPR64sp:$base, 
simm9:$offset)),
+  (STR_ZXI ZPR:$val, GPR64sp:$base, simm9:$offset)>;
+  }

rj-jesus wrote:

Thanks, that's where I had them initially, but it seems the predicates weren't 
being applied when the patterns were in `unpred_loadstore_bitcast`. For 
example, I was already using the `IsLE` predicate when I opened this PR, but it 
only became effective when I moved the patterns out into the separate loop as 
you can [see in the latest 
commit](https://github.com/llvm/llvm-project/pull/127837/commits/6b1ad6758cd854bc8d15e07b4dae4f2936c416bb#diff-e59a1dfcf45e1736fe516d49e304f65e921ac474ef443b78b2e6ca27abbded68).
 Although, now that I'm looking at it, the original patterns in 
`unpred_loadstore_bitcast` also [don't seem to be 
correct](https://godbolt.org/z/1bvndses7)? Am I missing anything?

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits



@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in {
   defm : unpred_loadstore_bitcast;
   defm : unpred_loadstore_bitcast;
 
+  // Allow using LDR/STR to avoid the predicate dependence.
+  let Predicates = [IsLE, AllowMisalignedMemAccesses] in
+foreach Ty = [ nxv16i8, nxv8i16, nxv4i32, nxv2i64, nxv8f16, nxv4f32, 
nxv2f64, nxv8bf16 ] in {
+  let AddedComplexity = 2 in {
+def : Pat<(Ty (load (am_sve_indexed_s9 GPR64sp:$base, simm9:$offset))),
+  (LDR_ZXI GPR64sp:$base, simm9:$offset)>;
+def : Pat<(store Ty:$val, (am_sve_indexed_s9 GPR64sp:$base, 
simm9:$offset)),
+  (STR_ZXI ZPR:$val, GPR64sp:$base, simm9:$offset)>;
+  }

rj-jesus wrote:

Ah, I see! Thanks very much, that makes sense! What if I absorb the current 
patterns into the loop so that we still have unconventional loads/stores 
grouped together, and add `HasSVE_or_SME` (from the parent definition) to the 
predicates? Or do you have a better suggestion?

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Lower unpredicated loads/stores as LDR/STR. (PR #127837)

2025-02-26 Thread Ricardo Jesus via cfe-commits



@@ -2993,6 +2993,22 @@ let Predicates = [HasSVE_or_SME] in {
   defm : unpred_loadstore_bitcast;
   defm : unpred_loadstore_bitcast;
 
+  // Allow using LDR/STR to avoid the predicate dependence.
+  let Predicates = [IsLE, AllowMisalignedMemAccesses] in

rj-jesus wrote:

Thank you very much for the feedback. I'll rebase the patch to resolve the 
conflict with `llvm/test/CodeGen/AArch64/sme-framelower-use-bp.ll` and commit 
it afterwards. I'll keep an ear out for reports of performance regressions.

Also, I think the other two patterns we were discussing above probably also 
need `AllowMisalignedMemAccesses` as they change the width of the vector 
elements accessed.

https://github.com/llvm/llvm-project/pull/127837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits



@@ -288,6 +288,7 @@ StringRef sys::detail::getHostCPUNameForARM(StringRef 
ProcCpuinfoContent) {
   if (Implementer == "0x4e") { // NVIDIA Corporation
 return StringSwitch(Part)
 .Case("0x004", "carmel")
+.Case("0x10", "olympus")

rj-jesus wrote:

Thanks for pointing this out, I've added "0x010" too.

https://github.com/llvm/llvm-project/pull/132368
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64] Add initial support for -mcpu=olympus. (PR #132368)

2025-03-24 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/132368

>From b9725e115876f26311edd408b9d4521ae8a03ebd Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Wed, 4 Dec 2024 05:42:38 -0800
Subject: [PATCH 1/2] [AArch64] Add initial support for -mcpu=olympus.

This patch adds support for the NVIDIA Olympus core.

This does not add any special tuning decisions, and those may come
later.
---
 clang/test/Driver/aarch64-nvidia-olympus.c| 13 +++
 .../aarch64-olympus.c | 82 +++
 .../Misc/target-invalid-cpu-note/aarch64.c|  1 +
 llvm/lib/Target/AArch64/AArch64Processors.td  | 25 ++
 llvm/lib/Target/AArch64/AArch64Subtarget.cpp  |  9 ++
 llvm/lib/TargetParser/Host.cpp|  1 +
 llvm/test/CodeGen/AArch64/cpus.ll |  1 +
 llvm/unittests/TargetParser/Host.cpp  |  4 +
 .../TargetParser/TargetParserTest.cpp |  3 +-
 9 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 clang/test/Driver/aarch64-nvidia-olympus.c
 create mode 100644 clang/test/Driver/print-enabled-extensions/aarch64-olympus.c

diff --git a/clang/test/Driver/aarch64-nvidia-olympus.c 
b/clang/test/Driver/aarch64-nvidia-olympus.c
new file mode 100644
index 0..e832d06917a25
--- /dev/null
+++ b/clang/test/Driver/aarch64-nvidia-olympus.c
@@ -0,0 +1,13 @@
+// RUN: %clang --target=aarch64 -mcpu=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=olympus %s
+// RUN: %clang --target=aarch64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 
| FileCheck -check-prefix=olympus %s
+// RUN: %clang --target=aarch64 -mtune=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=olympus-TUNE %s
+// RUN: %clang --target=aarch64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 
| FileCheck -check-prefix=olympus-TUNE %s
+// olympus: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "olympus"
+// olympus-TUNE: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "generic"
+
+// RUN: %clang --target=arm64 -mcpu=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=ARM64-olympus %s
+// RUN: %clang --target=arm64 -mlittle-endian -mcpu=olympus -### -c %s 2>&1 | 
FileCheck -check-prefix=ARM64-olympus %s
+// RUN: %clang --target=arm64 -mtune=olympus -### -c %s 2>&1 | FileCheck 
-check-prefix=ARM64-olympus-TUNE %s
+// RUN: %clang --target=arm64 -mlittle-endian -mtune=olympus -### -c %s 2>&1 | 
FileCheck -check-prefix=ARM64-olympus-TUNE %s
+// ARM64-olympus: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "olympus"
+// ARM64-olympus-TUNE: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" 
"generic"
diff --git a/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c 
b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c
new file mode 100644
index 0..a37ec4ac6aa7d
--- /dev/null
+++ b/clang/test/Driver/print-enabled-extensions/aarch64-olympus.c
@@ -0,0 +1,82 @@
+// REQUIRES: aarch64-registered-target
+// RUN: %clang --target=aarch64 --print-enabled-extensions -mcpu=olympus | 
FileCheck --strict-whitespace --implicit-check-not=FEAT_ %s
+
+// CHECK: Extensions enabled for the given AArch64 target
+// CHECK-EMPTY:
+// CHECK-NEXT: Architecture Feature(s)
Description
+// CHECK-NEXT: FEAT_AES, FEAT_PMULL   
Enable AES support
+// CHECK-NEXT: FEAT_AMUv1 
Enable Armv8.4-A Activity Monitors extension
+// CHECK-NEXT: FEAT_AMUv1p1   
Enable Armv8.6-A Activity Monitors Virtualization support
+// CHECK-NEXT: FEAT_AdvSIMD   
Enable Advanced SIMD instructions
+// CHECK-NEXT: FEAT_BF16  
Enable BFloat16 Extension
+// CHECK-NEXT: FEAT_BRBE  
Enable Branch Record Buffer Extension
+// CHECK-NEXT: FEAT_BTI   
Enable Branch Target Identification
+// CHECK-NEXT: FEAT_CCIDX 
Enable Armv8.3-A Extend of the CCSIDR number of sets
+// CHECK-NEXT: FEAT_CHK   
Enable Armv8.0-A Check Feature Status Extension
+// CHECK-NEXT: FEAT_CRC32 
Enable Armv8.0-A CRC-32 checksum instructions
+// CHECK-NEXT: FEAT_CSV2_2
Enable architectural speculation restriction
+// CHECK-NEXT: FEAT_Crypto
Enable cryptographic instructions
+// CHECK-NEXT: FEAT_DIT   
Enable Armv8.4-A Data Independent Timing instructions
+// CHECK-NEXT: FEAT_DPB   
Enable Armv8.2-A data Cache Clean to Point of Persistence
+// CHECK-NEXT: FEAT_DPB2  
Enable Armv8.5-A Cache Cl

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-12 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus closed 
https://github.com/llvm/llvm-project/pull/139236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-09 Thread Ricardo Jesus via cfe-commits



@@ -25030,7 +25030,8 @@ static SDValue foldCSELofLASTB(SDNode *Op, SelectionDAG 
&DAG) {
   if (AnyPred.getOpcode() == AArch64ISD::REINTERPRET_CAST)
 AnyPred = AnyPred.getOperand(0);
 
-  if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE)
+  if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE &&
+  !ISD::isConstantSplatVectorAllOnes(TruePred.getNode()))

rj-jesus wrote:

Thanks very much for looking into this. :) In that case, I'll drop the `PTRUE` 
check in favour of `isAllActivePredicate`.

https://github.com/llvm/llvm-project/pull/139236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-09 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus updated 
https://github.com/llvm/llvm-project/pull/139236



  



Rate limit · GitHub


  body {
background-color: #f6f8fa;
color: #24292e;
font-family: -apple-system,BlinkMacSystemFont,Segoe 
UI,Helvetica,Arial,sans-serif,Apple Color Emoji,Segoe UI Emoji,Segoe UI Symbol;
font-size: 14px;
line-height: 1.5;
margin: 0;
  }

  .container { margin: 50px auto; max-width: 600px; text-align: center; 
padding: 0 24px; }

  a { color: #0366d6; text-decoration: none; }
  a:hover { text-decoration: underline; }

  h1 { line-height: 60px; font-size: 48px; font-weight: 300; margin: 0px; 
text-shadow: 0 1px 0 #fff; }
  p { color: rgba(0, 0, 0, 0.5); margin: 20px 0 40px; }

  ul { list-style: none; margin: 25px 0; padding: 0; }
  li { display: table-cell; font-weight: bold; width: 1%; }

  .logo { display: inline-block; margin-top: 35px; }
  .logo-img-2x { display: none; }
  @media
  only screen and (-webkit-min-device-pixel-ratio: 2),
  only screen and (   min--moz-device-pixel-ratio: 2),
  only screen and ( -o-min-device-pixel-ratio: 2/1),
  only screen and (min-device-pixel-ratio: 2),
  only screen and (min-resolution: 192dpi),
  only screen and (min-resolution: 2dppx) {
.logo-img-1x { display: none; }
.logo-img-2x { display: inline-block; }
  }

  #suggestions {
margin-top: 35px;
color: #ccc;
  }
  #suggestions a {
color: #66;
font-weight: 200;
font-size: 14px;
margin: 0 10px;
  }


  
  



  Whoa there!
  You have exceeded a secondary rate limit.
Please wait a few minutes before you try again;
in some cases this may take up to an hour.
  
  
https://support.github.com/contact";>Contact Support —
https://githubstatus.com";>GitHub Status —
https://twitter.com/githubstatus";>@githubstatus
  

  

  

  

  

  


___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-09 Thread Ricardo Jesus via cfe-commits



@@ -25030,7 +25030,8 @@ static SDValue foldCSELofLASTB(SDNode *Op, SelectionDAG 
&DAG) {
   if (AnyPred.getOpcode() == AArch64ISD::REINTERPRET_CAST)
 AnyPred = AnyPred.getOperand(0);
 
-  if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE)
+  if (TruePred != AnyPred && TruePred.getOpcode() != AArch64ISD::PTRUE &&
+  !ISD::isConstantSplatVectorAllOnes(TruePred.getNode()))

rj-jesus wrote:

@huntergr-arm: I believe you added this fold in #112738 - does this change look 
reasonable to you?

https://github.com/llvm/llvm-project/pull/139236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-09 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus edited 
https://github.com/llvm/llvm-project/pull/139236
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [AArch64][SVE] Refactor getPTrue to return splat(1) when pattern=all. (PR #139236)

2025-05-09 Thread Ricardo Jesus via cfe-commits


https://github.com/rj-jesus created 
https://github.com/llvm/llvm-project/pull/139236

Similarly to #135016, refactor getPTrue to return splat (1) for all-active 
patterns. The main motivation for this patch is to improve code gen for 
fixed-length vector loads/stores that are converted to SVE masked memory ops 
when the vectors are wider than Neon. Emitting the mask as a splat helps 
DAGCombiner simplify all-active masked loads/stores into unmaked ones, for 
which it already has suitable combines.

There are four places in AArch64ISelLowering that match against 
AArch64ISD::PTRUE explicitly. Of these, only one (foldCSELofLASTB) led to test 
regressions, which I addressed by adding a check for 
ISD::isConstantSplatVectorAllOnes (I'm not sure if the original intent is to 
genuinely match any PTRUE node, or if isAllActivePredicate should be used 
instead). The other three combines (performUnpackCombine, performMSTORECombine 
and performSetCCPunpkCombine) check for patterns in the range [VL1, VL256], so 
I believe those should already skip all-active masks.

Given the recent changes, going this route seemed more sensible than 
replicating the combines from DAGCombiner or adding patterns for all-active 
masked loads/stores, but I'm happy pursuing either of those approaches (or any 
other) if they are seen as more appropriate.

>From 6ac7c16b89605d705a50d614e58a83639600e02b Mon Sep 17 00:00:00 2001
From: Ricardo Jesus 
Date: Wed, 7 May 2025 09:20:58 -0700
Subject: [PATCH] [AArch64][SVE] Refactor getPTrue to return splat(1) when
 pattern=all.

Similarly to #135016, refactor getPTrue to return splat (1) for
all-active patterns. The main motivation for this is to improve code gen
for fixed-length vector loads/stores that are lowered to SVE masked
memory ops when they are wider than Neon. Emitting the mask as a splat
helps DAGCombiner simplify all-active masked loads/stores into unmaked
ones, for which we already have suitable patterns.

There are four places in AArch64ISelLowering that match against
AArch64ISD::PTRUE opcodes explicitly. Of these, only one
(foldCSELofLASTB) led to test regressions, which I addressed by adding a
check for ISD::isConstantSplatVectorAllOnes (I'm not sure if the intent
here is to genuinely match any PTRUE node, or if isAllActivePredicate
should be used instead). The other three combines (performUnpackCombine,
performMSTORECombine and performSetCCPunpkCombine) check for patterns in
the range [VL1, VL256], so those should already skip all-active masks.

Given the recent changes, going this route seemed more sensible than
replicating the combines from DAGCombiner or adding patterns for
all-active masked loads/stores, but I'm happy pursuing either of these
approaches (or any other) if they are seen as more appropriate.
---
 .../CodeGen/AArch64/sve-vector-bits-codegen.c |   2 +-
 .../Target/AArch64/AArch64ISelLowering.cpp|   9 +-
 .../insert-subvector-res-legalization.ll  |   9 +-
 .../AArch64/sve-extract-fixed-vector.ll   |   3 +-
 .../CodeGen/AArch64/sve-fixed-ld2-alloca.ll   |   2 +-
 .../sve-fixed-length-extract-subvector.ll |   3 +-
 .../AArch64/sve-fixed-length-fp-convert.ll|   8 +-
 .../sve-fixed-length-frame-offests-crash.ll   |  25 ++-
 .../AArch64/sve-fixed-length-frame-offests.ll |  13 +-
 .../AArch64/sve-fixed-length-offsets.ll   |  12 +-
 .../sve-fixed-length-optimize-ptrue.ll|  43 ++---
 .../AArch64/sve-fixed-length-permute-rev.ll   |  44 ++---
 .../sve-fixed-length-permute-zip-uzp-trn.ll   | 178 --
 .../CodeGen/AArch64/sve-fixed-length-ptest.ll |  10 +-
 .../AArch64/sve-fixed-length-shuffles.ll  |  20 +-
 .../AArch64/sve-fixed-length-splat-vector.ll  |  30 ++-
 .../sve-fixed-length-vector-shuffle-tbl.ll|   3 +-
 .../test/CodeGen/AArch64/sve-insert-vector.ll |   6 +-
 llvm/test/CodeGen/AArch64/sve-ld-post-inc.ll  |  24 +--
 .../sve-uunpklo-load-uzp1-store-combine.ll|   6 +-
 llvm/test/CodeGen/AArch64/sve-vscale-attr.ll  |  20 +-
 21 files changed, 204 insertions(+), 266 deletions(-)

diff --git a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c 
b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
index 1391a1b09fbd1..36c3c7f745a2b 100644
--- a/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
+++ b/clang/test/CodeGen/AArch64/sve-vector-bits-codegen.c
@@ -16,7 +16,7 @@ void func(int *restrict a, int *restrict b) {
 // CHECK256-COUNT-8: str
 // CHECK512-COUNT-4: str
 // CHECK1024-COUNT-2: str
-// CHECK2048-COUNT-1: st1w
+// CHECK2048-COUNT-1: str
 #pragma clang loop vectorize(enable)
   for (int i = 0; i < 64; ++i)
 a[i] += b[i];
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp 
b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 795ac68e63087..c49ad6e1a1e16 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -5725,8 +5725,8 @@ SDValue AArch64TargetLowering::LowerMUL(SDValue Op, 
SelectionDAG &DAG) const {
 
 static inline S

66 matches

Mail list logo