Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> At some point in time we started lowering the ld1r instructions in gimple.
>
> That is:
>
> uint8x8_t f1(const uint8_t *in) {
>     return vld1_dup_u8(&in[1]);
> }
>
> generates at gimple:
>
>   _3 = MEM[(const uint8_t *)in_1(D) + 1B];
>   _4 = {_3, _3, _3, _3, _3, _3, _3, _3};
>
> Which is good, but we then generate:
>
> f1:
>       ldr     b0, [x0, 1]
>       dup     v0.8b, v0.b[0]
>       ret
>
> instead of ld1r.
>
> The reason for this is because the load instructions have a too restrictive
> predicate on them which causes combine not to be able to combine the
> instructions due to the predicate only accepting simple addressing modes.
>
> This patch relaxes the predicate to accept any memory operand and relies on
> LRA to legitimize the address when it needs to as the constraint still only
> allows the simple addressing mode.  Reload is always able to legitimize to
> these.
>
> Secondly since we are now actually generating more ld1r it became clear that 
> the
> lane instructions suffer from a similar issue.
>
> i.e.
>
> float32x4_t f2(const float32_t *in, float32x4_t a) {
>     float32x4_t dup = vld1q_dup_f32(&in[1]);
>     return vfmaq_laneq_f32 (a, a, dup, 1);
> }
>
> would generate ld1r + vector fmla instead of ldr + lane fmla.
>
> The reason for this is similar to the ld1r issue.  The predicate is too
> restrictive in only acception register operands but not memory.
>
> This relaxes it to accept register and/or memory while leaving the constraint
> to only accept registers.  This will have LRA generate a reload if needed
> forcing the memory to registers using the standard patterns.
>
> These two changes allow combine and reload to generate the right sequences.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

This is going against the general direction of travel, which is to make
the instruction's predicates and conditions enforce the constraints as
much as possible (making optimistic assumptions about pseudo registers).

The RA *can* deal with things like:

  (match_operand:M N "general_operand" "r")

but it's best avoided, for a few reasons:

(1) The fix-up will be done in LRA, so IRA will not see the temporary
    registers.  This can make the allocation of those temporaries
    suboptimal but (more importantly) it might require other
    previously-allocated registers to be spilled late due to the
    unexpected increase in register pressure.

(2) It ends up hiding instructions from the pre-RA optimisers.

(3) It can also prevent combine opportunities (as well as create them),
    unless the loose predicates in an insn I are propagated to all
    patterns that might result from combining I with something else.

It sounds like the first problem (not generating ld1r) could be fixed
by (a) combining aarch64_simd_dup<mode> and *aarch64_simd_ld1r<mode>,
so that the register and memory alternatives are in the same pattern
and (b) using the merged instruction(s) to implement the vec_duplicate
optab.  Target-independent code should then make the address satisfy
the predicate, simplifying the address where necessary.

I'm not sure whether fixing the ld1r problem that way will avoid the
vfmaq_laneq_f32 problem; let me know if not.

Thanks,
Richard

> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64-simd.md (mul_lane<mode>3, mul_laneq<mode>3,
>       mul_n<mode>3, *aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt<mode>,
>       *aarch64_mla_elt_<vswap_width_name><mode>, aarch64_mla_n<mode>,
>       *aarch64_mls_elt<mode>, *aarch64_mls_elt_<vswap_width_name><mode>,
>       aarch64_mls_n<mode>, *aarch64_fma4_elt<mode>,
>       *aarch64_fma4_elt_<vswap_width_name><mode>,
>       *aarch64_fma4_elt_from_dup<mode>, *aarch64_fma4_elt_to_64v2df,
>       *aarch64_fnma4_elt<mode>, *aarch64_fnma4_elt_<vswap_width_name><mode>,
>       *aarch64_fnma4_elt_from_dup<mode>, *aarch64_fnma4_elt_to_64v2df,
>       *aarch64_mulx_elt_<vswap_width_name><mode>,
>       *aarch64_mulx_elt<mode>, *aarch64_mulx_elt_from_dup<mode>,
>       *aarch64_vgetfmulx<mode>): Relax register_operand to
>       nonimmediate_operand.
>       (aarch64_simd_ld2<vstruct_elt>, aarch64_simd_ld2r<vstruct_elt>,
>       aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
>       vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st2<vstruct_elt>,
>       aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
>       vec_store_lanes<mode><vstruct_elt>, aarch64_simd_ld3<vstruct_elt>,
>       aarch64_simd_ld3r<vstruct_elt>,
>       aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
>       vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st3<vstruct_elt>,
>       aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
>       vec_store_lanes<mode><vstruct_elt>, aarch64_simd_ld4<vstruct_elt>,
>       aarch64_simd_ld4r<vstruct_elt>,
>       aarch64_vec_load_lanes<mode>_lane<vstruct_elt>,
>       vec_load_lanes<mode><vstruct_elt>, aarch64_simd_st4<vstruct_elt>,
>       aarch64_vec_store_lanes<mode>_lane<vstruct_elt>,
>       vec_store_lanes<mode><vstruct_elt>, aarch64_ld1_x3_<vstruct_elt>,
>       aarch64_ld1_x4_<vstruct_elt>, aarch64_st1_x2_<vstruct_elt>,
>       aarch64_st1_x3_<vstruct_elt>, aarch64_st1_x4_<vstruct_elt>,
>       aarch64_be_ld1<mode>, aarch64_be_st1<mode>,
>       aarch64_ld2<vstruct_elt>_dreg, aarch64_ld2<vstruct_elt>_dreg,
>       aarch64_ld3<vstruct_elt>_dreg, aarch64_ld3<vstruct_elt>_dreg,
>       aarch64_ld4<vstruct_elt>_dreg, aarch64_ld4<vstruct_elt>_dreg,
>       aarch64_st2<vstruct_elt>_dreg, aarch64_st2<vstruct_elt>_dreg,
>       aarch64_st3<vstruct_elt>_dreg, aarch64_st3<vstruct_elt>_dreg,
>       aarch64_st4<vstruct_elt>_dreg, aarch64_st4<vstruct_elt>_dreg,
>       *aarch64_simd_ld1r<mode>, aarch64_simd_ld1<vstruct_elt>_x2): Relax
>       aarch64_simd_struct_operand to memory_operand.
>       * config/aarch64/predicates.md (aarch64_simd_struct_operand): Remove.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/vld1r.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> be5c70bbb7520ae93d19c4a432ce34863e5b9a64..24e3274ddda2ea76c83571fada8ff4c953b752a1
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -712,7 +712,7 @@ (define_insn "mul_lane<mode>3"
>         (mult:VMULD
>        (vec_duplicate:VMULD
>          (vec_select:<VEL>
> -          (match_operand:<VCOND> 2 "register_operand" "<h_con>")
> +          (match_operand:<VCOND> 2 "nonimmediate_operand" "<h_con>")
>            (parallel [(match_operand:SI 3 "immediate_operand" "i")])))
>        (match_operand:VMULD 1 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -728,7 +728,7 @@ (define_insn "mul_laneq<mode>3"
>       (mult:VMUL
>         (vec_duplicate:VMUL
>         (vec_select:<VEL>
> -         (match_operand:<VCONQ> 2 "register_operand" "<h_con>")
> +         (match_operand:<VCONQ> 2 "nonimmediate_operand" "<h_con>")
>           (parallel [(match_operand:SI 3 "immediate_operand")])))
>        (match_operand:VMUL 1 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -743,7 +743,7 @@ (define_insn "mul_n<mode>3"
>   [(set (match_operand:VMUL 0 "register_operand" "=w")
>         (mult:VMUL
>        (vec_duplicate:VMUL
> -        (match_operand:<VEL> 2 "register_operand" "<h_con>"))
> +        (match_operand:<VEL> 2 "nonimmediate_operand" "<h_con>"))
>        (match_operand:VMUL 1 "register_operand" "w")))]
>    "TARGET_SIMD"
>    "<f>mul\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
> @@ -789,7 +789,7 @@ (define_insn "*aarch64_mul3_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>       (mult:DF
>         (vec_select:DF
> -      (match_operand:V2DF 1 "register_operand" "w")
> +      (match_operand:V2DF 1 "nonimmediate_operand" "w")
>        (parallel [(match_operand:SI 2 "immediate_operand")]))
>         (match_operand:DF 3 "register_operand" "w")))]
>    "TARGET_SIMD"
> @@ -1406,7 +1406,7 @@ (define_insn "*aarch64_mla_elt<mode>"
>        (mult:VDQHS
>          (vec_duplicate:VDQHS
>             (vec_select:<VEL>
> -             (match_operand:VDQHS 1 "register_operand" "<h_con>")
> +             (match_operand:VDQHS 1 "nonimmediate_operand" "<h_con>")
>                 (parallel [(match_operand:SI 2 "immediate_operand")])))
>          (match_operand:VDQHS 3 "register_operand" "w"))
>        (match_operand:VDQHS 4 "register_operand" "0")))]
> @@ -1424,7 +1424,7 @@ (define_insn "*aarch64_mla_elt_<vswap_width_name><mode>"
>        (mult:VDQHS
>          (vec_duplicate:VDQHS
>             (vec_select:<VEL>
> -             (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +             (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>                 (parallel [(match_operand:SI 2 "immediate_operand")])))
>          (match_operand:VDQHS 3 "register_operand" "w"))
>        (match_operand:VDQHS 4 "register_operand" "0")))]
> @@ -1441,7 +1441,7 @@ (define_insn "aarch64_mla_n<mode>"
>       (plus:VDQHS
>         (mult:VDQHS
>           (vec_duplicate:VDQHS
> -           (match_operand:<VEL> 3 "register_operand" "<h_con>"))
> +           (match_operand:<VEL> 3 "nonimmediate_operand" "<h_con>"))
>           (match_operand:VDQHS 2 "register_operand" "w"))
>         (match_operand:VDQHS 1 "register_operand" "0")))]
>   "TARGET_SIMD"
> @@ -1466,7 +1466,7 @@ (define_insn "*aarch64_mls_elt<mode>"
>        (mult:VDQHS
>          (vec_duplicate:VDQHS
>             (vec_select:<VEL>
> -             (match_operand:VDQHS 1 "register_operand" "<h_con>")
> +             (match_operand:VDQHS 1 "nonimmediate_operand" "<h_con>")
>                 (parallel [(match_operand:SI 2 "immediate_operand")])))
>          (match_operand:VDQHS 3 "register_operand" "w"))))]
>   "TARGET_SIMD"
> @@ -1484,7 +1484,7 @@ (define_insn "*aarch64_mls_elt_<vswap_width_name><mode>"
>        (mult:VDQHS
>          (vec_duplicate:VDQHS
>             (vec_select:<VEL>
> -             (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +             (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>                 (parallel [(match_operand:SI 2 "immediate_operand")])))
>          (match_operand:VDQHS 3 "register_operand" "w"))))]
>   "TARGET_SIMD"
> @@ -1501,7 +1501,7 @@ (define_insn "aarch64_mls_n<mode>"
>         (match_operand:VDQHS 1 "register_operand" "0")
>         (mult:VDQHS
>           (vec_duplicate:VDQHS
> -           (match_operand:<VEL> 3 "register_operand" "<h_con>"))
> +           (match_operand:<VEL> 3 "nonimmediate_operand" "<h_con>"))
>           (match_operand:VDQHS 2 "register_operand" "w"))))]
>    "TARGET_SIMD"
>    "mls\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[0]"
> @@ -2882,7 +2882,7 @@ (define_insn "*aarch64_fma4_elt<mode>"
>      (fma:VDQF
>        (vec_duplicate:VDQF
>       (vec_select:<VEL>
> -       (match_operand:VDQF 1 "register_operand" "<h_con>")
> +       (match_operand:VDQF 1 "nonimmediate_operand" "<h_con>")
>         (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQF 3 "register_operand" "w")
>        (match_operand:VDQF 4 "register_operand" "0")))]
> @@ -2899,7 +2899,7 @@ (define_insn 
> "*aarch64_fma4_elt_<vswap_width_name><mode>"
>      (fma:VDQSF
>        (vec_duplicate:VDQSF
>       (vec_select:<VEL>
> -       (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +       (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>         (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQSF 3 "register_operand" "w")
>        (match_operand:VDQSF 4 "register_operand" "0")))]
> @@ -2915,7 +2915,7 @@ (define_insn "*aarch64_fma4_elt_from_dup<mode>"
>    [(set (match_operand:VMUL 0 "register_operand" "=w")
>      (fma:VMUL
>        (vec_duplicate:VMUL
> -       (match_operand:<VEL> 1 "register_operand" "<h_con>"))
> +       (match_operand:<VEL> 1 "nonimmediate_operand" "<h_con>"))
>        (match_operand:VMUL 2 "register_operand" "w")
>        (match_operand:VMUL 3 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2927,7 +2927,7 @@ (define_insn "*aarch64_fma4_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>      (fma:DF
>       (vec_select:DF
> -       (match_operand:V2DF 1 "register_operand" "w")
> +       (match_operand:V2DF 1 "nonimmediate_operand" "w")
>         (parallel [(match_operand:SI 2 "immediate_operand")]))
>        (match_operand:DF 3 "register_operand" "w")
>        (match_operand:DF 4 "register_operand" "0")))]
> @@ -2957,7 +2957,7 @@ (define_insn "*aarch64_fnma4_elt<mode>"
>          (match_operand:VDQF 3 "register_operand" "w"))
>        (vec_duplicate:VDQF
>       (vec_select:<VEL>
> -       (match_operand:VDQF 1 "register_operand" "<h_con>")
> +       (match_operand:VDQF 1 "nonimmediate_operand" "<h_con>")
>         (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQF 4 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2975,7 +2975,7 @@ (define_insn 
> "*aarch64_fnma4_elt_<vswap_width_name><mode>"
>          (match_operand:VDQSF 3 "register_operand" "w"))
>        (vec_duplicate:VDQSF
>       (vec_select:<VEL>
> -       (match_operand:<VSWAP_WIDTH> 1 "register_operand" "<h_con>")
> +       (match_operand:<VSWAP_WIDTH> 1 "nonimmediate_operand" "<h_con>")
>         (parallel [(match_operand:SI 2 "immediate_operand")])))
>        (match_operand:VDQSF 4 "register_operand" "0")))]
>    "TARGET_SIMD"
> @@ -2992,7 +2992,7 @@ (define_insn "*aarch64_fnma4_elt_from_dup<mode>"
>        (neg:VMUL
>          (match_operand:VMUL 2 "register_operand" "w"))
>        (vec_duplicate:VMUL
> -     (match_operand:<VEL> 1 "register_operand" "<h_con>"))
> +     (match_operand:<VEL> 1 "nonimmediate_operand" "<h_con>"))
>        (match_operand:VMUL 3 "register_operand" "0")))]
>    "TARGET_SIMD"
>    "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
> @@ -3003,7 +3003,7 @@ (define_insn "*aarch64_fnma4_elt_to_64v2df"
>    [(set (match_operand:DF 0 "register_operand" "=w")
>      (fma:DF
>        (vec_select:DF
> -     (match_operand:V2DF 1 "register_operand" "w")
> +     (match_operand:V2DF 1 "nonimmediate_operand" "w")
>       (parallel [(match_operand:SI 2 "immediate_operand")]))
>        (neg:DF
>          (match_operand:DF 3 "register_operand" "w"))
> @@ -4934,7 +4934,7 @@ (define_insn 
> "*aarch64_mulx_elt_<vswap_width_name><mode>"
>        [(match_operand:VDQSF 1 "register_operand" "w")
>         (vec_duplicate:VDQSF
>          (vec_select:<VEL>
> -         (match_operand:<VSWAP_WIDTH> 2 "register_operand" "w")
> +         (match_operand:<VSWAP_WIDTH> 2 "nonimmediate_operand" "w")
>           (parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
>        UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -4953,7 +4953,7 @@ (define_insn "*aarch64_mulx_elt<mode>"
>        [(match_operand:VDQF 1 "register_operand" "w")
>         (vec_duplicate:VDQF
>          (vec_select:<VEL>
> -         (match_operand:VDQF 2 "register_operand" "w")
> +         (match_operand:VDQF 2 "nonimmediate_operand" "w")
>           (parallel [(match_operand:SI 3 "immediate_operand" "i")])))]
>        UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -4971,7 +4971,7 @@ (define_insn "*aarch64_mulx_elt_from_dup<mode>"
>       (unspec:VHSDF
>        [(match_operand:VHSDF 1 "register_operand" "w")
>         (vec_duplicate:VHSDF
> -         (match_operand:<VEL> 2 "register_operand" "<h_con>"))]
> +         (match_operand:<VEL> 2 "nonimmediate_operand" "<h_con>"))]
>        UNSPEC_FMULX))]
>    "TARGET_SIMD"
>    "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
> @@ -4987,7 +4987,7 @@ (define_insn "*aarch64_vgetfmulx<mode>"
>       (unspec:<VEL>
>        [(match_operand:<VEL> 1 "register_operand" "w")
>         (vec_select:<VEL>
> -        (match_operand:VDQF 2 "register_operand" "w")
> +        (match_operand:VDQF 2 "nonimmediate_operand" "w")
>           (parallel [(match_operand:SI 3 "immediate_operand" "i")]))]
>        UNSPEC_FMULX))]
>    "TARGET_SIMD"
> @@ -6768,7 +6768,7 @@ (define_insn "*sqrt<mode>2"
>  (define_insn "aarch64_simd_ld2<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2Q 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2Q [
> -       (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_2Q 1 "memory_operand" "Utv")]
>         UNSPEC_LD2))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -6778,7 +6778,7 @@ (define_insn "aarch64_simd_ld2<vstruct_elt>"
>  (define_insn "aarch64_simd_ld2r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2QD [
> -       (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD2_DUP))]
>    "TARGET_SIMD"
>    "ld2r\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -6788,7 +6788,7 @@ (define_insn "aarch64_simd_ld2r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2QD [
> -             (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +             (match_operand:BLK 1 "memory_operand" "Utv")
>               (match_operand:VSTRUCT_2QD 2 "register_operand" "0")
>               (match_operand:SI 3 "immediate_operand" "i")]
>               UNSPEC_LD2_LANE))]
> @@ -6804,7 +6804,7 @@ (define_insn 
> "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_2Q 0 "register_operand")
>       (unspec:VSTRUCT_2Q [
> -             (match_operand:VSTRUCT_2Q 1 "aarch64_simd_struct_operand")]
> +             (match_operand:VSTRUCT_2Q 1 "memory_operand")]
>               UNSPEC_LD2))]
>    "TARGET_SIMD"
>  {
> @@ -6822,7 +6822,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st2<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2Q 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_2Q [
>               (match_operand:VSTRUCT_2Q 1 "register_operand" "w")]
>                  UNSPEC_ST2))]
> @@ -6833,7 +6833,7 @@ (define_insn "aarch64_simd_st2<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>       (unspec:BLK [(match_operand:VSTRUCT_2QD 1 "register_operand" "w")
>                    (match_operand:SI 2 "immediate_operand" "i")]
>                    UNSPEC_ST2_LANE))]
> @@ -6847,7 +6847,7 @@ (define_insn 
> "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_2Q 0 "memory_operand")
>       (unspec:VSTRUCT_2Q [(match_operand:VSTRUCT_2Q 1 "register_operand")]
>                     UNSPEC_ST2))]
>    "TARGET_SIMD"
> @@ -6868,7 +6868,7 @@ (define_expand "vec_store_lanes<mode><vstruct_elt>"
>  (define_insn "aarch64_simd_ld3<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3Q 0 "register_operand" "=w")
>       (unspec:VSTRUCT_3Q [
> -       (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_3Q 1 "memory_operand" "Utv")]
>         UNSPEC_LD3))]
>    "TARGET_SIMD"
>    "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -6878,7 +6878,7 @@ (define_insn "aarch64_simd_ld3<vstruct_elt>"
>  (define_insn "aarch64_simd_ld3r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_3QD [
> -       (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD3_DUP))]
>    "TARGET_SIMD"
>    "ld3r\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -6888,7 +6888,7 @@ (define_insn "aarch64_simd_ld3r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_3QD [
> -             (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +             (match_operand:BLK 1 "memory_operand" "Utv")
>               (match_operand:VSTRUCT_3QD 2 "register_operand" "0")
>               (match_operand:SI 3 "immediate_operand" "i")]
>               UNSPEC_LD3_LANE))]
> @@ -6904,7 +6904,7 @@ (define_insn 
> "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3Q 0 "register_operand")
>       (unspec:VSTRUCT_3Q [
> -             (match_operand:VSTRUCT_3Q 1 "aarch64_simd_struct_operand")]
> +             (match_operand:VSTRUCT_3Q 1 "memory_operand")]
>               UNSPEC_LD3))]
>    "TARGET_SIMD"
>  {
> @@ -6922,7 +6922,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st3<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3Q 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_3Q [(match_operand:VSTRUCT_3Q 1 "register_operand" "w")]
>                     UNSPEC_ST3))]
>    "TARGET_SIMD"
> @@ -6932,7 +6932,7 @@ (define_insn "aarch64_simd_st3<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>       (unspec:BLK [(match_operand:VSTRUCT_3QD 1 "register_operand" "w")
>                    (match_operand:SI 2 "immediate_operand" "i")]
>                    UNSPEC_ST3_LANE))]
> @@ -6946,7 +6946,7 @@ (define_insn 
> "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_3Q 0 "memory_operand")
>       (unspec:VSTRUCT_3Q [
>               (match_operand:VSTRUCT_3Q 1 "register_operand")]
>                  UNSPEC_ST3))]
> @@ -6968,7 +6968,7 @@ (define_expand "vec_store_lanes<mode><vstruct_elt>"
>  (define_insn "aarch64_simd_ld4<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4Q 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4Q [
> -       (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_4Q 1 "memory_operand" "Utv")]
>         UNSPEC_LD4))]
>    "TARGET_SIMD"
>    "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -6978,7 +6978,7 @@ (define_insn "aarch64_simd_ld4<vstruct_elt>"
>  (define_insn "aarch64_simd_ld4r<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4QD [
> -       (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:BLK 1 "memory_operand" "Utv")]
>            UNSPEC_LD4_DUP))]
>    "TARGET_SIMD"
>    "ld4r\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -6988,7 +6988,7 @@ (define_insn "aarch64_simd_ld4r<vstruct_elt>"
>  (define_insn "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4QD [
> -             (match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> +             (match_operand:BLK 1 "memory_operand" "Utv")
>               (match_operand:VSTRUCT_4QD 2 "register_operand" "0")
>               (match_operand:SI 3 "immediate_operand" "i")]
>               UNSPEC_LD4_LANE))]
> @@ -7004,7 +7004,7 @@ (define_insn 
> "aarch64_vec_load_lanes<mode>_lane<vstruct_elt>"
>  (define_expand "vec_load_lanes<mode><vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4Q 0 "register_operand")
>       (unspec:VSTRUCT_4Q [
> -             (match_operand:VSTRUCT_4Q 1 "aarch64_simd_struct_operand")]
> +             (match_operand:VSTRUCT_4Q 1 "memory_operand")]
>               UNSPEC_LD4))]
>    "TARGET_SIMD"
>  {
> @@ -7022,7 +7022,7 @@ (define_expand "vec_load_lanes<mode><vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_simd_st4<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4Q 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_4Q [
>               (match_operand:VSTRUCT_4Q 1 "register_operand" "w")]
>                  UNSPEC_ST4))]
> @@ -7033,7 +7033,7 @@ (define_insn "aarch64_simd_st4<vstruct_elt>"
>  
>  ;; RTL uses GCC vector extension indices, so flip only for assembly.
>  (define_insn "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
> -  [(set (match_operand:BLK 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:BLK 0 "memory_operand" "=Utv")
>       (unspec:BLK [(match_operand:VSTRUCT_4QD 1 "register_operand" "w")
>                    (match_operand:SI 2 "immediate_operand" "i")]
>                    UNSPEC_ST4_LANE))]
> @@ -7047,7 +7047,7 @@ (define_insn 
> "aarch64_vec_store_lanes<mode>_lane<vstruct_elt>"
>  )
>  
>  (define_expand "vec_store_lanes<mode><vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4Q 0 "aarch64_simd_struct_operand")
> +  [(set (match_operand:VSTRUCT_4Q 0 "memory_operand")
>       (unspec:VSTRUCT_4Q [(match_operand:VSTRUCT_4Q 1 "register_operand")]
>                     UNSPEC_ST4))]
>    "TARGET_SIMD"
> @@ -7138,7 +7138,7 @@ (define_expand "aarch64_ld1x3<vstruct_elt>"
>  (define_insn "aarch64_ld1_x3_<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_3QD 0 "register_operand" "=w")
>          (unspec:VSTRUCT_3QD
> -       [(match_operand:VSTRUCT_3QD 1 "aarch64_simd_struct_operand" "Utv")]
> +       [(match_operand:VSTRUCT_3QD 1 "memory_operand" "Utv")]
>         UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -7158,7 +7158,7 @@ (define_expand "aarch64_ld1x4<vstruct_elt>"
>  (define_insn "aarch64_ld1_x4_<vstruct_elt>"
>    [(set (match_operand:VSTRUCT_4QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4QD
> -       [(match_operand:VSTRUCT_4QD 1 "aarch64_simd_struct_operand" "Utv")]
> +       [(match_operand:VSTRUCT_4QD 1 "memory_operand" "Utv")]
>       UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -7176,7 +7176,7 @@ (define_expand "aarch64_st1x2<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x2_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_2QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2QD 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_2QD
>               [(match_operand:VSTRUCT_2QD 1 "register_operand" "w")]
>               UNSPEC_ST1))]
> @@ -7196,7 +7196,7 @@ (define_expand "aarch64_st1x3<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x3_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_3QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3QD 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_3QD
>               [(match_operand:VSTRUCT_3QD 1 "register_operand" "w")]
>               UNSPEC_ST1))]
> @@ -7216,7 +7216,7 @@ (define_expand "aarch64_st1x4<vstruct_elt>"
>  })
>  
>  (define_insn "aarch64_st1_x4_<vstruct_elt>"
> -  [(set (match_operand:VSTRUCT_4QD 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4QD 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_4QD
>               [(match_operand:VSTRUCT_4QD 1 "register_operand" "w")]
>               UNSPEC_ST1))]
> @@ -7268,7 +7268,7 @@ (define_insn "*aarch64_movv8di"
>  (define_insn "aarch64_be_ld1<mode>"
>    [(set (match_operand:VALLDI_F16 0  "register_operand" "=w")
>       (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1
> -                          "aarch64_simd_struct_operand" "Utv")]
> +                          "memory_operand" "Utv")]
>       UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%0<Vmtype>}, %1"
> @@ -7276,7 +7276,7 @@ (define_insn "aarch64_be_ld1<mode>"
>  )
>  
>  (define_insn "aarch64_be_st1<mode>"
> -  [(set (match_operand:VALLDI_F16 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VALLDI_F16 0 "memory_operand" "=Utv")
>       (unspec:VALLDI_F16 [(match_operand:VALLDI_F16 1 "register_operand" "w")]
>       UNSPEC_ST1))]
>    "TARGET_SIMD"
> @@ -7551,7 +7551,7 @@ (define_expand "aarch64_ld<nregs>r<vstruct_elt>"
>  (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_2DNX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2DNX [
> -       (match_operand:VSTRUCT_2DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_2DNX 1 "memory_operand" "Utv")]
>         UNSPEC_LD2_DREG))]
>    "TARGET_SIMD"
>    "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> @@ -7561,7 +7561,7 @@ (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_2DX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2DX [
> -       (match_operand:VSTRUCT_2DX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_2DX 1 "memory_operand" "Utv")]
>         UNSPEC_LD2_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %T0.1d}, %1"
> @@ -7571,7 +7571,7 @@ (define_insn "aarch64_ld2<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_3DNX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_3DNX [
> -       (match_operand:VSTRUCT_3DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_3DNX 1 "memory_operand" "Utv")]
>         UNSPEC_LD3_DREG))]
>    "TARGET_SIMD"
>    "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> @@ -7581,7 +7581,7 @@ (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_3DX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_3DX [
> -       (match_operand:VSTRUCT_3DX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_3DX 1 "memory_operand" "Utv")]
>         UNSPEC_LD3_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %U0.1d}, %1"
> @@ -7591,7 +7591,7 @@ (define_insn "aarch64_ld3<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_4DNX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4DNX [
> -       (match_operand:VSTRUCT_4DNX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_4DNX 1 "memory_operand" "Utv")]
>         UNSPEC_LD4_DREG))]
>    "TARGET_SIMD"
>    "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> @@ -7601,7 +7601,7 @@ (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>  (define_insn "aarch64_ld4<vstruct_elt>_dreg"
>    [(set (match_operand:VSTRUCT_4DX 0 "register_operand" "=w")
>       (unspec:VSTRUCT_4DX [
> -       (match_operand:VSTRUCT_4DX 1 "aarch64_simd_struct_operand" "Utv")]
> +       (match_operand:VSTRUCT_4DX 1 "memory_operand" "Utv")]
>         UNSPEC_LD4_DREG))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.1d - %V0.1d}, %1"
> @@ -7841,7 +7841,7 @@ (define_insn "aarch64_rev<REVERSE:rev_op><mode>"
>  )
>  
>  (define_insn "aarch64_st2<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_2DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2DNX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_2DNX [
>               (match_operand:VSTRUCT_2DNX 1 "register_operand" "w")]
>               UNSPEC_ST2))]
> @@ -7851,7 +7851,7 @@ (define_insn "aarch64_st2<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st2<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_2DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_2DX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_2DX [
>               (match_operand:VSTRUCT_2DX 1 "register_operand" "w")]
>               UNSPEC_ST2))]
> @@ -7861,7 +7861,7 @@ (define_insn "aarch64_st2<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st3<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_3DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3DNX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_3DNX [
>               (match_operand:VSTRUCT_3DNX 1 "register_operand" "w")]
>               UNSPEC_ST3))]
> @@ -7871,7 +7871,7 @@ (define_insn "aarch64_st3<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st3<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_3DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_3DX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_3DX [
>               (match_operand:VSTRUCT_3DX 1 "register_operand" "w")]
>               UNSPEC_ST3))]
> @@ -7881,7 +7881,7 @@ (define_insn "aarch64_st3<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st4<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_4DNX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4DNX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_4DNX [
>               (match_operand:VSTRUCT_4DNX 1 "register_operand" "w")]
>               UNSPEC_ST4))]
> @@ -7891,7 +7891,7 @@ (define_insn "aarch64_st4<vstruct_elt>_dreg"
>  )
>  
>  (define_insn "aarch64_st4<vstruct_elt>_dreg"
> -  [(set (match_operand:VSTRUCT_4DX 0 "aarch64_simd_struct_operand" "=Utv")
> +  [(set (match_operand:VSTRUCT_4DX 0 "memory_operand" "=Utv")
>       (unspec:VSTRUCT_4DX [
>               (match_operand:VSTRUCT_4DX 1 "register_operand" "w")]
>               UNSPEC_ST4))]
> @@ -7974,7 +7974,7 @@ (define_expand "vec_init<mode><Vhalf>"
>  (define_insn "*aarch64_simd_ld1r<mode>"
>    [(set (match_operand:VALL_F16 0 "register_operand" "=w")
>       (vec_duplicate:VALL_F16
> -       (match_operand:<VEL> 1 "aarch64_simd_struct_operand" "Utv")))]
> +       (match_operand:<VEL> 1 "memory_operand" "Utv")))]
>    "TARGET_SIMD"
>    "ld1r\\t{%0.<Vtype>}, %1"
>    [(set_attr "type" "neon_load1_all_lanes")]
> @@ -7983,7 +7983,7 @@ (define_insn "*aarch64_simd_ld1r<mode>"
>  (define_insn "aarch64_simd_ld1<vstruct_elt>_x2"
>    [(set (match_operand:VSTRUCT_2QD 0 "register_operand" "=w")
>       (unspec:VSTRUCT_2QD [
> -         (match_operand:VSTRUCT_2QD 1 "aarch64_simd_struct_operand" "Utv")]
> +         (match_operand:VSTRUCT_2QD 1 "memory_operand" "Utv")]
>           UNSPEC_LD1))]
>    "TARGET_SIMD"
>    "ld1\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> diff --git a/gcc/config/aarch64/predicates.md 
> b/gcc/config/aarch64/predicates.md
> index 
> c308015ac2c13d24cd6bcec71247ec45df8cf5e6..6b70a364530c8108457091bfec12fe549f722149
>  100644
> --- a/gcc/config/aarch64/predicates.md
> +++ b/gcc/config/aarch64/predicates.md
> @@ -494,10 +494,6 @@ (define_predicate "aarch64_simd_reg_or_minus_one"
>    (ior (match_operand 0 "register_operand")
>         (match_operand 0 "aarch64_simd_imm_minus_one")))
>  
> -(define_predicate "aarch64_simd_struct_operand"
> -  (and (match_code "mem")
> -       (match_test "TARGET_SIMD && aarch64_simd_mem_operand_p (op)")))
> -
>  ;; Like general_operand but allow only valid SIMD addressing modes.
>  (define_predicate "aarch64_simd_general_operand"
>    (and (match_operand 0 "general_operand")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vld1r.c 
> b/gcc/testsuite/gcc.target/aarch64/vld1r.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..72c505c403e9e239771379b7cadd8a9473f06113
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vld1r.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */
> +
> +#include <arm_neon.h>
> +
> +/*
> +** f1:
> +**   add     x0, x0, 1
> +**   ld1r    {v0.8b}, \[x0\]
> +**   ret
> +*/
> +uint8x8_t f1(const uint8_t *in) {
> +    return vld1_dup_u8(&in[1]);
> +}
> +
> +/*
> +** f2:
> +**   ldr     s1, \[x0, 4\]
> +**   fmla    v0.4s, v0.4s, v1.s\[0\]
> +**   ret
> +*/
> +float32x4_t f2(const float32_t *in, float32x4_t a) {
> +    float32x4_t dup = vld1q_dup_f32(&in[1]);
> +    return vfmaq_laneq_f32 (a, a, dup, 1);
> +}

Reply via email to