Hi Vladimir,
Thanks for the patches!
> On 6 Nov 2024, at 08:50, [email protected] wrote:
>
>
> The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory
> from Armv9.5-a. This extension introduces instructions for lookup table
> read with 2-bit indices.
>
> This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table
> lookup with 2-bit packed indices. The following intrinsics are added:
>
> * vluti2{q}_lane{q}_u8
> * vluti2{q}_lane{q}_s8
> * vluti2{q}_lane{q}_p8
> * vluti2{q}_lane{q}_u16
> * vluti2{q}_lane{q}_s16
> * vluti2{q}_lane{q}_p16
> * vluti2{q}_lane{q}_f16
> * vluti2{q}_lane{q}_bf16
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.cc (enum class):
> Add binary_lane shape.
> (aarch64_fntype): Modify to handle binary_lane shape.
> (aarch64_expand_pragma_builtin): Extend to distinguish
> and expand binary and binary lane-based intrinsics.
>
> * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION):
> Add LUT feature flag.
>
> * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_LANE):
> New macro for lane-based intrinsics.
> (ENTRY_VLANEIU): New macro for LUTI lanes (unsigned).
> (ENTRY_VLANEIS): New macro for LUTI lanes (signed).
> (ENTRY_VLANEP): New macro for LUTI lanes (poly).
> (ENTRY_VLANEF): New macro for LUTI lanes (float).
> (ENTRY_VLANEBF): New macro for LUTI lanes (bfloat).
> (REQUIRED_EXTENSIONS): Set per LUTI requirements.
>
> * config/aarch64/aarch64-simd.md
> (@aarch64_<vluti_uns_op><VLUT1:mode><VLUT2:mode>):
> Add instruction pattern for LUTI2 instructions.
>
> * config/aarch64/aarch64.h (TARGET_LUT): Add TARGET_LUT macro for
> enabling LUT extension support.
>
> * config/aarch64/iterators.md (v16qi): Update iterators to include
> VLUT1 and VLUT2 for LUTI2 operations.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/simd/vluti-builtins.c: New test.
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 22 +-
> .../aarch64/aarch64-option-extensions.def | 2 +
> .../aarch64/aarch64-simd-pragma-builtins.def | 61 ++++
> gcc/config/aarch64/aarch64-simd.md | 10 +
> gcc/config/aarch64/aarch64.h | 4 +
> gcc/config/aarch64/iterators.md | 25 ++
> .../gcc.target/aarch64/simd/vluti-builtins.c | 329 ++++++++++++++++++
> 7 files changed, 452 insertions(+), 1 deletion(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
>
> <0002-aarch64-Add-AdvSIMD-LUT-extension-and-vluti2-q-_lane.patch>
@@ -3383,7 +3395,7 @@ static rtx
aarch64_expand_pragma_builtin (tree exp, rtx target,
const aarch64_pragma_builtins_data *builtin_data)
{
- expand_operand ops[3];
+ expand_operand ops[4];
auto op1 = expand_normal (CALL_EXPR_ARG (exp, 0));
auto op2 = expand_normal (CALL_EXPR_ARG (exp, 1));
create_output_operand (&ops[0], target, builtin_data->types[0].mode);
@@ -3399,6 +3411,14 @@ aarch64_expand_pragma_builtin (tree exp, rtx target,
icode = code_for_aarch64 (unspec, builtin_data->types[0].mode);
expand_insn (icode, 3, ops);
break;
+ case aarch64_builtin_signatures::binary_lane:
+ rtx op3;
+ op3 = expand_normal (CALL_EXPR_ARG (exp, 2));
+ create_input_operand (&ops[3], op3, SImode);
+ icode = code_for_aarch64 (unspec,
+ builtin_data->types[1].mode, builtin_data->types[2].mode);
+ expand_insn (icode, 4, ops);
+ break;
As these are lane intrinsics I think we should have logic to validate that the
lane number given is in range.
We already have the require_const_argument that you can use here to check it
and emit the right message.
On that topic...
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
new file mode 100644
index 00000000000..142657ba2ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c
@@ -0,0 +1,329 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -march=armv9-a+lut" } */
+/* { dg-final { check-function-bodies "**" ""} } */
+
+#include "arm_neon.h"
+
+/*
+** test_vluti2_lane_u8:
+** luti2 v0\.8b, v0\.8b, v1\[0\]
+** ret
+*/
+
+uint8x16_t
+test_vluti2_lane_u8(uint8x8_t a, uint8x8_t b)
+{
+ return vluti2_lane_u8(a, b, 0);
+}
+
+/*
+** test_vluti2q_lane_u8:
+** luti2 v0\.16b, v0\.16b, v1\[0\]
+** ret
+*/
+
+uint8x16_t
+test_vluti2q_lane_u8(uint8x16_t a, uint8x8_t b)
+{
+ return vluti2q_lane_u8(a, b, 0);
+}
… we should have tests for other lane numbers as well, particularly the maximum
allowed.
The rest of the patch looks ok to me though I’d let Richard comment on the
streaming/non-streaming logic in aarch64-simd-pragma-builtins.def
Thanks,
Kyrill