Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> writes:
> Hi,
> After 27de9aa152141e7f3ee66372647d0f2cd94c4b90, there's a following 
> regression:
> FAIL: gcc.target/aarch64/vect_copy_lane_1.c scan-assembler-times
> ins\\tv0.s\\[1\\], v1.s\\[0\\] 3
>
> This happens because for the following function from vect_copy_lane_1.c:
> float32x2_t
> __attribute__((noinline, noclone)) test_copy_lane_f32 (float32x2_t a,
> float32x2_t b)
> {
>   return vcopy_lane_f32 (a, 1, b, 0);
> }
>
> Before 27de9aa152141e7f3ee66372647d0f2cd94c4b90,
> it got lowered to following sequence in .optimized dump:
>   <bb 2> [local count: 1073741824]:
>   _4 = BIT_FIELD_REF <b_3(D), 32, 0>;
>   __a_5 = BIT_INSERT_EXPR <a_2(D), _4, 32>;
>   return __a_5;
>
> The above commit simplifies BIT_FIELD_REF + BIT_INSERT_EXPR
> to vector permutation and now thus gets lowered to:
>
>   <bb 2> [local count: 1073741824]:
>   __a_4 = VEC_PERM_EXPR <a_2(D), b_3(D), { 0, 2 }>;
>   return __a_4;
>
> Since we give higher priority to aarch64_evpc_zip over aarch64_evpc_ins
> in aarch64_expand_vec_perm_const_1, it now generates:
>
> test_copy_lane_f32:
>         zip1    v0.2s, v0.2s, v1.2s
>         ret
>
> Similarly for test_copy_lane_[us]32.

Yeah, I suppose this choice is at least as good as INS.  It has the advantage
that the source and destination don't need to be tied.  For example:

int32x2_t f(int32x2_t a, int32x2_t b, int32x2_t c) {
    return vcopy_lane_s32 (b, 1, c, 0);
}

used to be:

f:
        mov     v0.8b, v1.8b
        ins     v0.s[1], v2.s[0]
        ret

but is now:

f:
        zip1    v0.2s, v1.2s, v2.2s
        ret

> The attached patch adjusts the tests to reflect the change in code-gen
> and the tests pass.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> index 2848be564d5..811dc678b92 100644
> --- a/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_copy_lane_1.c
> @@ -22,7 +22,7 @@ BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2)
>  BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0)
>  BUILD_TEST (int32x2_t,   int32x2_t,   , , s32, 1, 0)
>  BUILD_TEST (uint32x2_t,  uint32x2_t,  , , u32, 1, 0)
> -/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 
> } } */
> +/* { dg-final { scan-assembler-times "zip1\\tv0.2s, v0.2s, v1.2s" 3 } } */
>  BUILD_TEST (int64x1_t,   int64x1_t,   , , s64, 0, 0)
>  BUILD_TEST (uint64x1_t,  uint64x1_t,  , , u64, 0, 0)
>  BUILD_TEST (float64x1_t, float64x1_t, , , f64, 0, 0)

OK, thanks.

Richard

Reply via email to