> On 5 Dec 2024, at 10:25 PM, Richard Sandiford <[email protected]> > wrote: > > External email: Use caution opening links or attachments > > > Soumya AR <[email protected]> writes: >> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain >> operands >> inverted. These can be extended to work with Neon modes. >> >> Since these instructions are unpredicated, duplicate patterns were added with >> the predicate removed to generate these instructions for Neon modes. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Soumya AR <[email protected]> >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve2.md >> (*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated >> form. >> (*aarch64_sve2_bsl1n_unpred<mode>): Likewise. >> (*aarch64_sve2_bsl2n_unpred<mode>): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/bitsel.c: New test. >> >> From 4be7a0ba40150322e2f67214f54c55da51ad2fd2 Mon Sep 17 00:00:00 2001 >> From: Soumya AR <[email protected]> >> Date: Mon, 25 Nov 2024 11:25:44 +0530 >> Subject: [PATCH] aarch64: Extend SVE2 bit-select instructions for Neon modes. >> >> NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain >> operands >> inverted. These can be extended to work with Neon modes. >> >> Since these instructions are unpredicated, duplicate patterns were added with >> the predicate removed to generate these instructions for Neon modes. >> >> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. >> OK for mainline? >> >> Signed-off-by: Soumya AR <[email protected]> >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve2.md >> (*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated >> form. >> (*aarch64_sve2_bsl1n_unpred<mode>): Likewise. >> (*aarch64_sve2_bsl2n_unpred<mode>): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/sve/bitsel.c: New test. > > Looks good, but some comments about the formatting: > >> --- >> gcc/config/aarch64/aarch64-sve2.md | 71 +++++++++++++++++++ >> gcc/testsuite/gcc.target/aarch64/sve/bitsel.c | 35 +++++++++ >> 2 files changed, 106 insertions(+) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/bitsel.c >> >> diff --git a/gcc/config/aarch64/aarch64-sve2.md >> b/gcc/config/aarch64/aarch64-sve2.md >> index e67421bad84..6d38c40e573 100644 >> --- a/gcc/config/aarch64/aarch64-sve2.md >> +++ b/gcc/config/aarch64/aarch64-sve2.md >> @@ -1696,6 +1696,23 @@ >> } >> ) >> >> +(define_insn "*aarch64_sve2_nbsl_unpred<mode>" >> + [(set (match_operand:VDQ_I 0 "register_operand") >> + (not:VDQ_I >> + (xor:VDQ_I >> + (and:VDQ_I >> + (xor:VDQ_I >> + (match_operand:VDQ_I 1 "register_operand") >> + (match_operand:VDQ_I 2 "register_operand")) >> + (match_operand:VDQ_I 3 "register_operand")) >> + (match_dup BSL_DUP))))] >> + "TARGET_SVE2" >> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] >> + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] nbsl\t%Z0.d, >> %Z0.d, %Z<bsl_dup>.d, %Z3.d >> + [ ?&w , w , w , w ; yes ] >> movprfx\t%Z0, %Z<bsl_mov>\;nbsl\t%Z0.d, %Z0.d, %Z<bsl_dup>.d, %Z3.d >> + } >> +) >> + >> ;; Unpredicated bitwise select with inverted first operand. >> ;; (op3 ? ~bsl_mov : bsl_dup) == ((~(bsl_mov ^ bsl_dup) & op3) ^ bsl_dup) >> (define_expand "@aarch64_sve2_bsl1n<mode>" >> @@ -1741,6 +1758,23 @@ >> } >> ) >> >> +(define_insn "*aarch64_sve2_bsl1n_unpred<mode>" >> + [(set (match_operand:VDQ_I 0 "register_operand") >> + (xor:VDQ_I >> + (and:VDQ_I >> + (not:VDQ_I >> + (xor:VDQ_I >> + (match_operand:VDQ_I 1 "register_operand") >> + (match_operand:VDQ_I 2 "register_operand"))) > > Removing the unspecs from the SVE patterns leaves these operations > indented by an unusual amount. Generally: > > - if the first operand is on the same line as the operator (as with > the set above), the other operands are indented to be underneath > the first operand. > > - if an operator appears on a line by itself, the operands are generally > indented by two spaces more than the operator. > > Same for the others. Also: > >> + (match_operand:VDQ_I 3 "register_operand")) >> + (match_dup BSL_DUP)))] >> + "TARGET_SVE2" >> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ] >> + [ w , <bsl_1st> , <bsl_2nd> , w ; * ] >> bsl1n\t%Z0.d, %Z0.d, %Z<bsl_dup>.d, %Z3.d >> + [ ?&w , w , w , w ; yes ] >> movprfx\t%Z0, %Z<bsl_mov>\;bsl1n\t%Z0.d, %Z0.d, %Z<bsl_dup>.d, %Z3.d >> + } >> +) >> + >> ;; Unpredicated bitwise select with inverted second operand. >> ;; (bsl_dup ? bsl_mov : ~op3) == ((bsl_dup & bsl_mov) | (~op3 & ~bsl_dup)) >> (define_expand "@aarch64_sve2_bsl2n<mode>" >> @@ -1815,6 +1849,43 @@ >> } >> ) >> >> +(define_insn "*aarch64_sve2_bsl2n_unpred<mode>" >> + [(set (match_operand:VDQ_I 0 "register_operand") >> + (ior:VDQ_I >> + (and:VDQ_I >> + (match_operand:VDQ_I 1 "register_operand") >> + (match_operand:VDQ_I 2 "register_operand")) >> + (and:VDQ_I >> + (not:VDQ_I >> + (match_operand:VDQ_I 3 "register_operand")) >> + (not:VDQ_I >> + (match_dup BSL_DUP))) >> + ))] > > ...all the close brackets would normally be on the same line as the > match_dup. > > Elsewhere I said: > > ...the work seems like an ongoing > project with no well-defined end point, so it seemed like the GCC 15 > cut-off would have to be time-driven rather than feature-driven. > > But another valid cut-off would be to take what has already been posted. > So let's do that :) Please repost the reformatted patch for review though. >
That’s great! Thank you.
Posting the reformatted patch, let me know if this is fine.
Best,
Soumya
> Thanks,
> Richard
>
>> + "TARGET_SVE2"
>> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ]
>> + [ w , <bsl_1st> , <bsl_2nd> , w ; * ]
>> bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d
>> + [ ?&w , w , w , w ; yes ]
>> movprfx\t%Z0, %Z<bsl_mov>\;bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d
>> + }
>> +)
>> +
>> +(define_insn "*aarch64_sve2_bsl2n_unpred<mode>"
>> + [(set (match_operand:VDQ_I 0 "register_operand")
>> + (ior:VDQ_I
>> + (and:VDQ_I
>> + (match_operand:VDQ_I 1 "register_operand")
>> + (match_operand:VDQ_I 2 "register_operand"))
>> + (and:VDQ_I
>> + (not:VDQ_I
>> + (match_dup BSL_DUP))
>> + (not:VDQ_I
>> + (match_operand:VDQ_I 3 "register_operand")))))]
>> + "TARGET_SVE2"
>> + {@ [ cons: =0 , 1 , 2 , 3 ; attrs: movprfx ]
>> + [ w , <bsl_1st> , <bsl_2nd> , w ; * ]
>> bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d
>> + [ ?&w , w , w , w ; yes ]
>> movprfx\t%Z0, %Z<bsl_mov>\;bsl2n\t%Z0.d, %Z0.d, %Z3.d, %Z<bsl_dup>.d
>> + }
>> +)
>> +
>> ;; -------------------------------------------------------------------------
>> ;; ---- [INT] Shift-and-accumulate operations
>> ;; -------------------------------------------------------------------------
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c
>> b/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c
>> new file mode 100644
>> index 00000000000..635bfefc17c
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/bitsel.c
>> @@ -0,0 +1,35 @@
>> +/* { dg-options "-O2 -mcpu=neoverse-v2 --param
>> aarch64-autovec-preference=asimd-only" } */
>> +
>> +#include <stdint.h>
>> +
>> +#define OPNBSL(x,y,z) (~(((x) & (z)) | ((y) & ~(z))))
>> +#define OPBSL1N(x,y,z) ((~(x) & (z)) | ((y) & ~(z)))
>> +#define OPBSL2N(x,y,z) (((x) & (z)) | (~(y) & ~(z)))
>> +
>> +#define N 1024
>> +
>> +#define TYPE(N) int##N##_t
>> +
>> +#define TEST(SIZE, OP, SUFFIX) \
>> +void __attribute__ ((noinline, noclone)) \
>> +f_##SIZE##_##SUFFIX \
>> + (TYPE(SIZE) *restrict a, TYPE(SIZE) *restrict b, \
>> + TYPE(SIZE) *restrict c, TYPE(SIZE) *restrict d) \
>> +{ \
>> + for (int i = 0; i < N; i++) \
>> + a[i] = OP (b[i], c[i], d[i]); \
>> +}
>> +
>> +#define TEST_ALL(SIZE) \
>> + TEST(SIZE, OPNBSL, nbsl) \
>> + TEST(SIZE, OPBSL1N, bsl1n) \
>> + TEST(SIZE, OPBSL2N, bsl2n)
>> +
>> +TEST_ALL(8);
>> +TEST_ALL(16);
>> +TEST_ALL(32);
>> +TEST_ALL(64);
>> +
>> +/* { dg-final { scan-assembler-times {\tnbsl\tz[0-9]+\.d, z[0-9]+\.d,
>> z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */
>> +/* { dg-final { scan-assembler-times {\tbsl1n\tz[0-9]+\.d, z[0-9]+\.d,
>> z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */
>> +/* { dg-final { scan-assembler-times {\tbsl2n\tz[0-9]+\.d, z[0-9]+\.d,
>> z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */
>> \ No newline at end of file
v2-0001-aarch64-Extend-SVE2-bit-select-instructions-for-Neon.patch
Description: v2-0001-aarch64-Extend-SVE2-bit-select-instructions-for-Neon.patch
