Hi all,

The BCAX instruction from TARGET_SHA3 only operates on the full .16b form
of the inputs but as it's a pure bitwise operation we can use it for the 64-bit
modes as well as there we don't care about the upper 64 bits. This patch extends
the relevant pattern in aarch64-simd.md to accept the 64-bit vector modes.

Thus, for the input:
uint32x2_t
bcax_s (uint32x2_t a, uint32x2_t b, uint32x2_t c)
{
 return BCAX (a, b, c);
}

we can now generate:
bcax_s:
bcax v0.16b, v0.16b, v1.16b, v2.16b
ret

instead of the current:
bcax_s:
bic v1.8b, v1.8b, v2.8b
eor v0.8b, v1.8b, v0.8b
ret

This patch doesn't cover the DI/V1DI modes as that would require extending
the bcaxqdi4 pattern with =r,r alternatives and adding splitting logic to
handle the cases where the operands arrive in GP regs. It is doable, but can
be a separate patch. This patch as is should be a straightforward improvement
always.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?
Thanks,
Kyrill

Signed-off-by: Kyrylo Tkachov <ktkac...@nvidia.com>

gcc/

        * config/aarch64/aarch64-simd.md (bcaxq<mode>4): Use VDQ_I mode
        iterator.

gcc/testsuite/

        * gcc.target/aarch64/simd/bcax_d.c: New test.

Attachment: 0001-aarch64-Allow-64-bit-vector-modes-in-pattern-for-BCA.patch
Description: 0001-aarch64-Allow-64-bit-vector-modes-in-pattern-for-BCA.patch

Reply via email to