On Thu, Feb 1, 2024 at 1:26 AM Tamar Christina <[email protected]> wrote:
>
> Hi All,
>
> In the vget_set_lane_1.c test the following entries now generate a zip1
> instead of an INS
>
> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
> BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
>
> This is because the non-Q variant for indices 0 and 1 are just shuffling
> values.
> There is no perf difference between INS SIMD to SIMD and ZIP, as such just
> update the
> test file.
Hmm, is this true on all cores? I suspect there is a core out there
where INS is implemented with a much lower latency than ZIP.
If we look at config/aarch64/thunderx.md, we can see INS is 2 cycles
while ZIP is 6 cycles (3/7 for q versions).
Now I don't have any invested interest in that core any more but I
just wanted to point out that is not exactly true for all cores.
> Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
This is PR 112375 by the way.
Thanks,
Andrew Pinski
>
> Thanks,
> Tamar
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vget_set_lane_1.c: Update test output.
>
> --- inline copy of patch --
> diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> index
> 07a77de319206c5c6dad1c0d2d9bcc998583f9c1..a3978f68e4ff5899f395a98615a5e86c3b1389cb
> 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
> BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
> BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0)
> BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0)
> -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3
> } } */
> +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
>
> BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15)
> BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15)
>
>
>
>
> --