Hi,

This patch adds combine pass support for the following SVE2 bitwise logic 
instructions:

- EOR3          (3-way vector exclusive OR)
- BSL           (bitwise select)
- NBSL          (inverted ")
- BSL1N         (" with first input inverted)
- BSL2N         (" with second input inverted)

Example template snippet:

void foo (TYPE *a, TYPE *b, TYPE *c, TYPE *d, int n)
{
  for (int i = 0; i < n; i++)
    a[i] = OP (b[i], c[i], d[i]);
}

EOR3:

  // #define OP(x,y,z) ((x) ^ (y) ^ (z))

  before        eor     z1.d, z1.d, z2.d
                eor     z0.d, z0.d, z1.d
  ...
  after         eor3    z0.d, z0.d, z1.d, z2.d

BSL:

  // #define OP(x,y,z) (((x) & (z)) | ((y) & ~(z)))

  before        eor     z0.d, z0.d, z1.d
                and     z0.d, z0.d, z2.d
                eor     z0.d, z0.d, z1.d
  ...
  after         bsl     z0.d, z0.d, z1.d, z2.d

NBSL:

  // #define OP(x,y,z) ~(((x) & (z)) | ((y) & ~(z)))

  before        eor     z0.d, z0.d, z1.d
                and     z0.d, z0.d, z2.d
                eor     z0.d, z0.d, z1.d
                not     z0.s, p1/m, z0.s
  ...
  after         nbsl    z0.d, z0.d, z1.d, z2.d

BSL1N:

  // #define OP(x,y,z) ((~(x) & (z)) | ((y) & ~(z)))

  before        eor     z0.d, z0.d, z1.d
                bic     z0.d, z2.d, z0.d
                eor     z0.d, z0.d, z1.d
  ...
  after         bsl1n   z0.d, z0.d, z1.d, z2.d

BSL2N:

  // #define OP(x,y,z) (((x) & (z)) | (~(y) & ~(z)))

  before        orr     z0.d, z1.d, z0.d
                and     z1.d, z1.d, z2.d
                not     z0.s, p1/m, z0.s
                orr     z0.d, z0.d, z1.d
  ...
  after         bsl2n   z0.d, z0.d, z1.d, z2.d

Additionally, vector NOR and NAND operations are now optimized with NBSL:

  NOR   x, y  ->  NBSL  x, y, x
  NAND  x, y  ->  NBSL  x, y, y

Built and tested on aarch64-none-elf.

Best Regards,
Yuliang Wang


gcc/ChangeLog:

2019-10-16  Yuliang Wang  <yuliang.w...@arm.com>

        * config/aarch64/aarch64-sve2.md (aarch64_sve2_eor3<mode>)
        (aarch64_sve2_nor<mode>, aarch64_sve2_nand<mode>)
        (aarch64_sve2_bsl<mode>, aarch64_sve2_nbsl<mode>)
        (aarch64_sve2_bsl1n<mode>, aarch64_sve2_bsl2n<mode>):
        New combine patterns.
        * config/aarch64/iterators.md (BSL_3RD): New int iterator for the above.
        (bsl_1st, bsl_2nd, bsl_3rd, bsl_mov): Attributes for the above.
        * config/aarch64/aarch64.h (AARCH64_ISA_SVE2_AES, AARCH64_ISA_SVE2_SM4)
        (AARCH64_ISA_SVE2_SHA3, AARCH64_ISA_SVE2_BITPERM): New ISA flag macros.
        (TARGET_SVE2_AES, TARGET_SVE2_SM4, TARGET_SVE2_SHA3)
        (TARGET_SVE2_BITPERM): New CPU targets.

gcc/testsuite/ChangeLog:

2019-10-16  Yuliang Wang  <yuliang.w...@arm.com>

        * gcc.target/aarch64/sve2/eor3_1.c: New test.
        * gcc.target/aarch64/sve2/eor3_2.c: As above.
        * gcc.target/aarch64/sve2/nlogic_1.c: As above.
        * gcc.target/aarch64/sve2/nlogic_2.c: As above.
        * gcc.target/aarch64/sve2/bitsel_1.c: As above.
        * gcc.target/aarch64/sve2/bitsel_2.c: As above.
        * gcc.target/aarch64/sve2/bitsel_3.c: As above.
        * gcc.target/aarch64/sve2/bitsel_4.c: As above.

Attachment: rb11975.patch
Description: rb11975.patch

Reply via email to