Re: [PATCH 2/2] aarch64: Add AdvSIMD LUT extension and vluti2{q}_lane{q} intrinsics

Kyrylo Tkachov Wed, 06 Nov 2024 01:10:46 -0800

Hi Vladimir,
Thanks for the patches!

> On 6 Nov 2024, at 08:50, vladimir.miloser...@arm.com wrote:
> 
> 
> The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory
> from Armv9.5-a. This extension introduces instructions for lookup table
> read with 2-bit indices.
> 
> This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table
> lookup with 2-bit packed indices. The following intrinsics are added:
> 
> * vluti2{q}_lane{q}_u8
> * vluti2{q}_lane{q}_s8
> * vluti2{q}_lane{q}_p8
> * vluti2{q}_lane{q}_u16
> * vluti2{q}_lane{q}_s16
> * vluti2{q}_lane{q}_p16
> * vluti2{q}_lane{q}_f16
> * vluti2{q}_lane{q}_bf16
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-builtins.cc (enum class):
> Add binary_lane shape.
> (aarch64_fntype): Modify to handle binary_lane shape.
> (aarch64_expand_pragma_builtin): Extend to distinguish
> and expand binary and binary lane-based intrinsics.
> 
> * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
> Add LUT feature flag.
> 
> * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_LANE):
> New macro for lane-based intrinsics.
> (ENTRY_VLANEIU): New macro for LUTI lanes (unsigned).
> (ENTRY_VLANEIS): New macro for LUTI lanes (signed).
> (ENTRY_VLANEP): New macro for LUTI lanes (poly).
> (ENTRY_VLANEF): New macro for LUTI lanes (float).
> (ENTRY_VLANEBF): New macro for LUTI lanes (bfloat).
> (REQUIRED_EXTENSIONS): Set per LUTI requirements.
> 
> * config/aarch64/aarch64-simd.md 
> (@aarch64_<vluti_uns_op><VLUT1:mode><VLUT2:mode>):
> Add instruction pattern for LUTI2 instructions.
> 
> * config/aarch64/aarch64.h (TARGET_LUT): Add TARGET_LUT macro for
> enabling LUT extension support.
> 
> * config/aarch64/iterators.md (v16qi): Update iterators to include
> VLUT1 and VLUT2 for LUTI2 operations.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/simd/vluti-builtins.c: New test.
> ---
> gcc/config/aarch64/aarch64-builtins.cc        |  22 +-
> .../aarch64/aarch64-option-extensions.def     |   2 +
> .../aarch64/aarch64-simd-pragma-builtins.def  |  61 ++++
> gcc/config/aarch64/aarch64-simd.md            |  10 +
> gcc/config/aarch64/aarch64.h                  |   4 +
> gcc/config/aarch64/iterators.md               |  25 ++
> .../gcc.target/aarch64/simd/vluti-builtins.c  | 329 ++++++++++++++++++
> 7 files changed, 452 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
> 
> <0002-aarch64-Add-AdvSIMD-LUT-extension-and-vluti2-q-_lane.patch>


@@ -3383,7 +3395,7 @@ static rtx
 aarch64_expand_pragma_builtin (tree exp, rtx target,
        const aarch64_pragma_builtins_data *builtin_data)
 {
-  expand_operand ops[3];
+  expand_operand ops[4];
   auto op1 = expand_normal (CALL_EXPR_ARG (exp, 0));
   auto op2 = expand_normal (CALL_EXPR_ARG (exp, 1));
   create_output_operand (&ops[0], target, builtin_data->types[0].mode);
@@ -3399,6 +3411,14 @@ aarch64_expand_pragma_builtin (tree exp, rtx target,
       icode = code_for_aarch64 (unspec, builtin_data->types[0].mode);
       expand_insn (icode, 3, ops);
       break;
+    case aarch64_builtin_signatures::binary_lane:
+      rtx op3;
+      op3 = expand_normal (CALL_EXPR_ARG (exp, 2));
+      create_input_operand (&ops[3], op3, SImode);
+      icode = code_for_aarch64 (unspec,
+ builtin_data->types[1].mode, builtin_data->types[2].mode);
+      expand_insn (icode, 4, ops);
+      break;

As these are lane intrinsics I think we should have logic to validate that the 
lane number given is in range.
We already have the require_const_argument that you can use here to check it 
and emit the right message.
On that topic...

diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c 
b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
new file mode 100644
index 00000000000..142657ba2ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
@@ -0,0 +1,329 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -march=armv9-a+lut" } */
+/* { dg-final { check-function-bodies "**" ""} } */
+
+#include "arm_neon.h"
+
+/*
+** test_vluti2_lane_u8:
+**     luti2   v0\.8b, v0\.8b, v1\[0\]
+**     ret
+*/
+
+uint8x16_t
+test_vluti2_lane_u8(uint8x8_t a, uint8x8_t b)
+{
+  return vluti2_lane_u8(a, b, 0);
+}
+
+/*
+** test_vluti2q_lane_u8:
+**     luti2   v0\.16b, v0\.16b, v1\[0\]
+**     ret
+*/
+
+uint8x16_t
+test_vluti2q_lane_u8(uint8x16_t a, uint8x8_t b)
+{
+  return vluti2q_lane_u8(a, b, 0);
+}
… we should have tests for other lane numbers as well, particularly the maximum 
allowed.

The rest of the patch looks ok to me though I’d let Richard comment on the 
streaming/non-streaming logic in aarch64-simd-pragma-builtins.def
Thanks,
Kyrill

Re: [PATCH 2/2] aarch64: Add AdvSIMD LUT extension and vluti2{q}_lane{q} intrinsics

Reply via email to