Hi Vladimir, Thanks for the patches! > On 6 Nov 2024, at 08:50, vladimir.miloser...@arm.com wrote: > > > The AArch64 FEAT_LUT extension is optional from Armv9.2-a and mandatory > from Armv9.5-a. This extension introduces instructions for lookup table > read with 2-bit indices. > > This patch adds AdvSIMD LUT intrinsics for LUTI2, supporting table > lookup with 2-bit packed indices. The following intrinsics are added: > > * vluti2{q}_lane{q}_u8 > * vluti2{q}_lane{q}_s8 > * vluti2{q}_lane{q}_p8 > * vluti2{q}_lane{q}_u16 > * vluti2{q}_lane{q}_s16 > * vluti2{q}_lane{q}_p16 > * vluti2{q}_lane{q}_f16 > * vluti2{q}_lane{q}_bf16 > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc (enum class): > Add binary_lane shape. > (aarch64_fntype): Modify to handle binary_lane shape. > (aarch64_expand_pragma_builtin): Extend to distinguish > and expand binary and binary lane-based intrinsics. > > * config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION): > Add LUT feature flag. > > * config/aarch64/aarch64-simd-pragma-builtins.def (ENTRY_LANE): > New macro for lane-based intrinsics. > (ENTRY_VLANEIU): New macro for LUTI lanes (unsigned). > (ENTRY_VLANEIS): New macro for LUTI lanes (signed). > (ENTRY_VLANEP): New macro for LUTI lanes (poly). > (ENTRY_VLANEF): New macro for LUTI lanes (float). > (ENTRY_VLANEBF): New macro for LUTI lanes (bfloat). > (REQUIRED_EXTENSIONS): Set per LUTI requirements. > > * config/aarch64/aarch64-simd.md > (@aarch64_<vluti_uns_op><VLUT1:mode><VLUT2:mode>): > Add instruction pattern for LUTI2 instructions. > > * config/aarch64/aarch64.h (TARGET_LUT): Add TARGET_LUT macro for > enabling LUT extension support. > > * config/aarch64/iterators.md (v16qi): Update iterators to include > VLUT1 and VLUT2 for LUTI2 operations. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/vluti-builtins.c: New test. > --- > gcc/config/aarch64/aarch64-builtins.cc | 22 +- > .../aarch64/aarch64-option-extensions.def | 2 + > .../aarch64/aarch64-simd-pragma-builtins.def | 61 ++++ > gcc/config/aarch64/aarch64-simd.md | 10 + > gcc/config/aarch64/aarch64.h | 4 + > gcc/config/aarch64/iterators.md | 25 ++ > .../gcc.target/aarch64/simd/vluti-builtins.c | 329 ++++++++++++++++++ > 7 files changed, 452 insertions(+), 1 deletion(-) > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c > > <0002-aarch64-Add-AdvSIMD-LUT-extension-and-vluti2-q-_lane.patch>
@@ -3383,7 +3395,7 @@ static rtx aarch64_expand_pragma_builtin (tree exp, rtx target, const aarch64_pragma_builtins_data *builtin_data) { - expand_operand ops[3]; + expand_operand ops[4]; auto op1 = expand_normal (CALL_EXPR_ARG (exp, 0)); auto op2 = expand_normal (CALL_EXPR_ARG (exp, 1)); create_output_operand (&ops[0], target, builtin_data->types[0].mode); @@ -3399,6 +3411,14 @@ aarch64_expand_pragma_builtin (tree exp, rtx target, icode = code_for_aarch64 (unspec, builtin_data->types[0].mode); expand_insn (icode, 3, ops); break; + case aarch64_builtin_signatures::binary_lane: + rtx op3; + op3 = expand_normal (CALL_EXPR_ARG (exp, 2)); + create_input_operand (&ops[3], op3, SImode); + icode = code_for_aarch64 (unspec, + builtin_data->types[1].mode, builtin_data->types[2].mode); + expand_insn (icode, 4, ops); + break; As these are lane intrinsics I think we should have logic to validate that the lane number given is in range. We already have the require_const_argument that you can use here to check it and emit the right message. On that topic... diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c new file mode 100644 index 00000000000..142657ba2ab --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/vluti-builtins.c @@ -0,0 +1,329 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -march=armv9-a+lut" } */ +/* { dg-final { check-function-bodies "**" ""} } */ + +#include "arm_neon.h" + +/* +** test_vluti2_lane_u8: +** luti2 v0\.8b, v0\.8b, v1\[0\] +** ret +*/ + +uint8x16_t +test_vluti2_lane_u8(uint8x8_t a, uint8x8_t b) +{ + return vluti2_lane_u8(a, b, 0); +} + +/* +** test_vluti2q_lane_u8: +** luti2 v0\.16b, v0\.16b, v1\[0\] +** ret +*/ + +uint8x16_t +test_vluti2q_lane_u8(uint8x16_t a, uint8x8_t b) +{ + return vluti2q_lane_u8(a, b, 0); +} … we should have tests for other lane numbers as well, particularly the maximum allowed. The rest of the patch looks ok to me though I’d let Richard comment on the streaming/non-streaming logic in aarch64-simd-pragma-builtins.def Thanks, Kyrill