Victor Do Nascimento via Gcc-patches <[email protected]> writes:
> The backend pattern for storing a pair of identical values in 32 and 64-bit
> modes with the machine instruction STP was missing, and multiple instructions
> were needed to reproduce this behavior as a result of failed RTL pattern
> match in combine pass.
>
> For the test case :
>
> typedef long long v2di __attribute__((vector_size (16)));
> typedef int v2si __attribute__((vector_size (8)));
>
> void
> foo (v2di *x, long long a)
> {
> v2di tmp = {a, a};
> *x = tmp;
> }
>
> void
> foo2 (v2si *x, int a)
> {
> v2si tmp = {a, a};
> *x = tmp;
> }
>
> at -O2 on aarch64 gives:
>
> foo:
> stp x1, x1, [x0]
> ret
> foo2:
> stp w1, w1, [x0]
> ret
>
> instead of:
>
> foo:
> dup v0.2d, x1
> str q0, [x0]
> ret
> foo2:
> dup v0.2s, w1
> str d0, [x0]
> ret
>
> In preparation for the next stage 1 phase of development, added new RTL
> template, unittest and checked for regressions on bootstrapped
> aarch64-none-linux-gnu.
>
> gcc/ChangeLog
>
> 2021-02-04 victor Do Nascimento <[email protected]>
>
> * config/aarch64/aarch64-simd.md: Implement RTX pattern for
> mapping 'vec_duplicate' RTX onto 'STP' ASM insn.
> * config/aarch64/iterators.md: Implement ldpstp_vel_sz iterator
> to map STP/LDP vector element mode to correct suffix in
> attribute type definition of aarch64_simd_stp<mode> pattern.
A more typical changelog entry would be:
* config/aarch64/iterators.md (ldpstp_vel_sz): New mode attribute.
* config/aarch64/aarch64-simd.md (aarch64_simd_stp<mode>): New pattern.
> gcc/testsuite/ChangeLog
>
> 2021-02-04 Victor Do Nascimento <[email protected]>
>
> * gcc.target/stp_vec-dup_32_64-1.c: Added test.
>
> Regards,
> Victor
>
> ---
> gcc/config/aarch64/aarch64-simd.md | 10 +++++++++
> gcc/config/aarch64/iterators.md | 3 +++
> .../gcc.target/aarch64/stp_vec_dup_32_64-1.c | 22 +++++++++++++++++++
> 3 files changed, 35 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 71aa77dd010..3d53bab0018 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -205,6 +205,16 @@
> [(set_attr "type" "neon_stp")]
> )
>
> +(define_insn "aarch64_simd_stp<mode>"
> + [(set (match_operand:VP_2E 0 "aarch64_mem_pair_operand" "=Ump,Ump")
> + (vec_duplicate:VP_2E (match_operand:<VEL> 1 "register_operand"
> "w,r")))]
Formatting nit: should just be one tab here.
I would have just made that change locally and committed, but I think
there's a problem: aarch64_mem_pair_operand and Ump are geared for pairs
of full-vector stores, rather than for pairs of elements. This means that
(for example) the V2SI range will be [-256,255] * 8 rather than the expected
[-256,255] * 4.
I think we need to use aarch64_mem_pair_lanes_operand and Umn instead,
as for store_pair_lanes<mode>. In addition:
/* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode
corresponds to the actual size of the memory being loaded/stored and the
mode of the corresponding addressing mode is half of that. */
if (type == ADDR_QUERY_LDP_STP_N
&& known_eq (GET_MODE_SIZE (mode), 16))
mode = DFmode;
only handles 128-bit vectors, whereas here we need it to handle 64-bit
vectors too.
It would be good to test the limits, e.g.:
> diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
> b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
> new file mode 100644
> index 00000000000..a37c903dfd4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_dup_32_64-1.c
> @@ -0,1 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef long long v2di __attribute__((vector_size (16)));
> +typedef int v2si __attribute__((vector_size (8)));
> +
> +void
> +foo (v2di *x, long long a)
> +{
> + v2di tmp = {a, a};
> + *x = tmp;
> +}
> +
> +void
> +foo2 (v2si *x, int a)
> +{
> + v2si tmp = {a, a};
> + *x = tmp;
> +}
We could have additional tests for:
x[-129] = tmp; // out of range
x[-128] = tmp; // in range
x[127] = tmp; // in range
x[128] = tmp; // out of range
Thanks,
Richard
> +
> +/* { dg-final { scan-assembler-times "stp\t" 2 } } */
> +/* { dg-final { scan-assembler-not "dup\t" } } */