[PATCH v3 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions
From: Dhruv Chawla This patch modifies the shift expander to immediately lower constant shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns to match the lowered forms of the shifts, as the predicate register is not required for these instructions. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla gcc/ChangeLog: * gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift): Match lowered form of ashift. (*aarch64_adr_shift): Likewise. (*aarch64_adr_shift_sxtw): Likewise. (*aarch64_adr_shift_uxtw): Likewise. (3): Avoid moving legal immediate shift amounts into a new register. (v3): Generate unpredicated shifts for constant operands. (*post_ra_v_ashl3): Rename to ... (aarch64_vashl3_const): ... this and remove reload requirement. (*post_ra_v_3): Rename to ... (aarch64_v3_const): ... this and remove reload requirement. * gcc/config/aarch64/aarch64-sve2.md (@aarch64_sve_add_): Match lowered form of SHIFTRT. (*aarch64_sve2_sra): Likewise. (*bitmask_shift_plus): Match lowered form of lshiftrt. --- gcc/config/aarch64/aarch64-sve.md | 90 +- gcc/config/aarch64/aarch64-sve2.md | 46 +-- 2 files changed, 53 insertions(+), 83 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index bf7569f932b..cb88d6d95a6 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -4234,80 +4234,57 @@ (define_expand "@aarch64_adr_shift" [(set (match_operand:SVE_FULL_SDI 0 "register_operand") (plus:SVE_FULL_SDI - (unspec:SVE_FULL_SDI - [(match_dup 4) -(ashift:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 2 "register_operand") - (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:SVE_FULL_SDI + (match_operand:SVE_FULL_SDI 2 "register_operand") + (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand")) (match_operand:SVE_FULL_SDI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - { -operands[4] = CONSTM1_RTX (mode); - } ) -(define_insn_and_rewrite "*aarch64_adr_shift" +(define_insn "*aarch64_adr_shift" [(set (match_operand:SVE_24I 0 "register_operand" "=w") (plus:SVE_24I - (unspec:SVE_24I - [(match_operand 4) -(ashift:SVE_24I - (match_operand:SVE_24I 2 "register_operand" "w") - (match_operand:SVE_24I 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:SVE_24I + (match_operand:SVE_24I 2 "register_operand" "w") + (match_operand:SVE_24I 3 "const_1_to_3_operand")) (match_operand:SVE_24I 1 "register_operand" "w")))] "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0., [%1., %2., lsl %3]" - "&& !CONSTANT_P (operands[4])" - { -operands[4] = CONSTM1_RTX (mode); - } ) ;; Same, but with the index being sign-extended from the low 32 bits. (define_insn_and_rewrite "*aarch64_adr_shift_sxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") (plus:VNx2DI - (unspec:VNx2DI - [(match_operand 4) -(ashift:VNx2DI - (unspec:VNx2DI -[(match_operand 5) - (sign_extend:VNx2DI - (truncate:VNx2SI - (match_operand:VNx2DI 2 "register_operand" "w")))] -UNSPEC_PRED_X) - (match_operand:VNx2DI 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:VNx2DI + (unspec:VNx2DI + [(match_operand 4) + (sign_extend:VNx2DI +(truncate:VNx2SI + (match_operand:VNx2DI 2 "register_operand" "w")))] +UNSPEC_PRED_X) + (match_operand:VNx2DI 3 "const_1_to_3_operand")) (match_operand:VNx2DI 1 "register_operand" "w")))] "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, sxtw %3]" - "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))" + "&& !CONSTANT_P (operands[4])" { -operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode); +operands[4] = CONSTM1_RTX (VNx2BImode); } ) ;; Same, but with the index being zero-extended from the low 32 bits. -(define_insn_and_rewrite "*aarch64_adr_shift_uxtw" +(define_insn "*aarch64_adr_shift_uxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") (plus:VNx2DI - (unspec:VNx2DI - [(match_operand 5) -(ashift:VNx2DI - (and:VNx2DI -(match_operand:VNx2DI 2 "register_operand" "w") -(match_operand:VNx2DI 4 "aarch64_sve_uxtw_immediate")) - (match_operand:VNx2DI 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:VNx2DI +
[PATCH v3 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts
From: Dhruv Chawla This patch modifies the intrinsic expanders to expand svlsl and svlsr to unpredicated forms when the predicate is a ptrue. It also folds the following pattern: lsl , , lsr , , orr , , to: revb/h/w , when the shift amount is equal to half the bitwidth of the register. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla Co-authored-by: Richard Sandiford gcc/ChangeLog: * expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the target already provided the result in the expected register. * config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Avoid forcing subregs into fresh registers unnecessarily. * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::expand): Define. (svlsr_impl): New class. (svlsr_impl::fold): Define. (svlsr_impl::expand): Likewise. * config/aarch64/aarch64-sve.md: Add define_split for rotate. (*v_revvnx8hi): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/shift_rev_1.c: New test. * gcc.target/aarch64/sve/shift_rev_2.c: Likewise. * gcc.target/aarch64/sve/shift_rev_3.c: Likewise. --- .../aarch64/aarch64-sve-builtins-base.cc | 33 +++- gcc/config/aarch64/aarch64-sve.md | 55 gcc/config/aarch64/aarch64.cc | 10 ++- gcc/expmed.cc | 3 +- .../gcc.target/aarch64/sve/shift_rev_1.c | 83 +++ .../gcc.target/aarch64/sve/shift_rev_2.c | 63 ++ .../gcc.target/aarch64/sve/shift_rev_3.c | 83 +++ 7 files changed, 326 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc index b4396837c24..90dd5c97a10 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc @@ -2086,6 +2086,37 @@ public: { return f.fold_const_binary (LSHIFT_EXPR); } + + rtx expand (function_expander &e) const override + { +tree pred = TREE_OPERAND (e.call_expr, 3); +tree shift = TREE_OPERAND (e.call_expr, 5); +if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ())) + && uniform_integer_cst_p (shift)) + return e.use_unpred_insn (e.direct_optab_handler (ashl_optab)); +return rtx_code_function::expand (e); + } +}; + +class svlsr_impl : public rtx_code_function +{ +public: + CONSTEXPR svlsr_impl () : rtx_code_function (LSHIFTRT, LSHIFTRT) {} + + gimple *fold (gimple_folder &f) const override + { +return f.fold_const_binary (RSHIFT_EXPR); + } + + rtx expand (function_expander &e) const override + { +tree pred = TREE_OPERAND (e.call_expr, 3); +tree shift = TREE_OPERAND (e.call_expr, 5); +if (is_ptrue (pred, GET_MODE_UNIT_SIZE (e.result_mode ())) + && uniform_integer_cst_p (shift)) + return e.use_unpred_insn (e.direct_optab_handler (lshr_optab)); +return rtx_code_function::expand (e); + } }; class svmad_impl : public function_base @@ -3586,7 +3617,7 @@ FUNCTION (svldnt1, svldnt1_impl,) FUNCTION (svlen, svlen_impl,) FUNCTION (svlsl, svlsl_impl,) FUNCTION (svlsl_wide, shift_wide, (ASHIFT, UNSPEC_ASHIFT_WIDE)) -FUNCTION (svlsr, rtx_code_function, (LSHIFTRT, LSHIFTRT)) +FUNCTION (svlsr, svlsr_impl,) FUNCTION (svlsr_wide, shift_wide, (LSHIFTRT, UNSPEC_LSHIFTRT_WIDE)) FUNCTION (svmad, svmad_impl,) FUNCTION (svmax, rtx_code_function, (SMAX, UMAX, UNSPEC_COND_FMAX, diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index cb88d6d95a6..0156afc1e7d 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3317,6 +3317,61 @@ ;; - REVW ;; - +(define_split + [(set (match_operand:SVE_FULL_HSDI 0 "register_operand") + (rotate:SVE_FULL_HSDI + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))] + "TARGET_SVE && can_create_pseudo_p ()" + [(set (match_dup 3) + (ashift:SVE_FULL_HSDI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (plus:SVE_FULL_HSDI + (lshiftrt:SVE_FULL_HSDI (match_dup 1) + (match_dup 4)) + (match_dup 3)))] + { +if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2])) + DONE; + +if (!TARGET_SVE2) + FAIL; + +operands[3] = gen_reg_rtx (mode); +HOST_WIDE_INT shift_amount = + INTVAL (unwrap_const_vec_duplicate (operands[2])); +int bitwidth = GET_MODE_UNIT_BITSIZE (mode); +
[PATCH v4 1/2] aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions
From: Dhruv Chawla This patch modifies the shift expander to immediately lower constant shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns to match the lowered forms of the shifts, as the predicate register is not required for these instructions. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla Co-authored-by: Richard Sandiford gcc/ChangeLog: * gcc/config/aarch64/aarch64-sve.md (@aarch64_adr_shift): Match lowered form of ashift. (*aarch64_adr_shift): Likewise. (*aarch64_adr_shift_sxtw): Likewise. (*aarch64_adr_shift_uxtw): Likewise. (3): Check amount instead of operands[2] in aarch64_sve_shift_operand. (v3): Generate unpredicated shifts for constant operands. (@aarch64_pred_): Convert to a define_expand. (*aarch64_pred_): Create define_insn_and_split pattern from @aarch64_pred_. (*post_ra_v_ashl3): Rename to ... (aarch64_vashl3_const): ... this and remove reload requirement. (*post_ra_v_3): Rename to ... (aarch64_v3_const): ... this and remove reload requirement. * gcc/config/aarch64/aarch64-sve2.md (@aarch64_sve_add_): Match lowered form of SHIFTRT. (*aarch64_sve2_sra): Likewise. (*bitmask_shift_plus): Match lowered form of lshiftrt. --- gcc/config/aarch64/aarch64-sve.md | 119 +++-- gcc/config/aarch64/aarch64-sve2.md | 46 --- 2 files changed, 75 insertions(+), 90 deletions(-) diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index bf7569f932b..e1ec778b10d 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -4234,80 +4234,57 @@ (define_expand "@aarch64_adr_shift" [(set (match_operand:SVE_FULL_SDI 0 "register_operand") (plus:SVE_FULL_SDI - (unspec:SVE_FULL_SDI - [(match_dup 4) -(ashift:SVE_FULL_SDI - (match_operand:SVE_FULL_SDI 2 "register_operand") - (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:SVE_FULL_SDI + (match_operand:SVE_FULL_SDI 2 "register_operand") + (match_operand:SVE_FULL_SDI 3 "const_1_to_3_operand")) (match_operand:SVE_FULL_SDI 1 "register_operand")))] "TARGET_SVE && TARGET_NON_STREAMING" - { -operands[4] = CONSTM1_RTX (mode); - } ) -(define_insn_and_rewrite "*aarch64_adr_shift" +(define_insn "*aarch64_adr_shift" [(set (match_operand:SVE_24I 0 "register_operand" "=w") (plus:SVE_24I - (unspec:SVE_24I - [(match_operand 4) -(ashift:SVE_24I - (match_operand:SVE_24I 2 "register_operand" "w") - (match_operand:SVE_24I 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:SVE_24I + (match_operand:SVE_24I 2 "register_operand" "w") + (match_operand:SVE_24I 3 "const_1_to_3_operand")) (match_operand:SVE_24I 1 "register_operand" "w")))] "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0., [%1., %2., lsl %3]" - "&& !CONSTANT_P (operands[4])" - { -operands[4] = CONSTM1_RTX (mode); - } ) ;; Same, but with the index being sign-extended from the low 32 bits. (define_insn_and_rewrite "*aarch64_adr_shift_sxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") (plus:VNx2DI - (unspec:VNx2DI - [(match_operand 4) -(ashift:VNx2DI - (unspec:VNx2DI -[(match_operand 5) - (sign_extend:VNx2DI - (truncate:VNx2SI - (match_operand:VNx2DI 2 "register_operand" "w")))] -UNSPEC_PRED_X) - (match_operand:VNx2DI 3 "const_1_to_3_operand"))] - UNSPEC_PRED_X) + (ashift:VNx2DI + (unspec:VNx2DI + [(match_operand 4) + (sign_extend:VNx2DI +(truncate:VNx2SI + (match_operand:VNx2DI 2 "register_operand" "w")))] +UNSPEC_PRED_X) + (match_operand:VNx2DI 3 "const_1_to_3_operand")) (match_operand:VNx2DI 1 "register_operand" "w")))] "TARGET_SVE && TARGET_NON_STREAMING" "adr\t%0.d, [%1.d, %2.d, sxtw %3]" - "&& (!CONSTANT_P (operands[4]) || !CONSTANT_P (operands[5]))" + "&& !CONSTANT_P (operands[4])" { -operands[5] = operands[4] = CONSTM1_RTX (VNx2BImode); +operands[4] = CONSTM1_RTX (VNx2BImode); } ) ;; Same, but with the index being zero-extended from the low 32 bits. -(define_insn_and_rewrite "*aarch64_adr_shift_uxtw" +(define_insn "*aarch64_adr_shift_uxtw" [(set (match_operand:VNx2DI 0 "register_operand" "=w") (plus:VNx2DI - (unspec:VNx2DI - [(match_operand 5) -(ashift:VNx2DI - (and:VNx2DI -(match_operand:VNx2DI 2 "register_operand" "w") -
[PATCH v4 2/2] aarch64: Fold lsl+lsr+orr to rev for half-width shifts
From: Dhruv Chawla This patch folds the following pattern: lsl , , lsr , , orr , , to: revb/h/w , when the shift amount is equal to half the bitwidth of the register. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla Co-authored-by: Richard Sandiford gcc/ChangeLog: * expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the target already provided the result in the expected register. * config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Avoid forcing subregs into fresh registers unnecessarily. * config/aarch64/aarch64-sve.md: Add define_split for rotate. (*v_revvnx8hi): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/shift_rev_1.c: New test. * gcc.target/aarch64/sve/shift_rev_2.c: Likewise. * gcc.target/aarch64/sve/shift_rev_3.c: Likewise. --- gcc/config/aarch64/aarch64-sve.md | 55 gcc/config/aarch64/aarch64.cc | 10 ++- gcc/expmed.cc | 3 +- .../gcc.target/aarch64/sve/shift_rev_1.c | 83 +++ .../gcc.target/aarch64/sve/shift_rev_2.c | 63 ++ .../gcc.target/aarch64/sve/shift_rev_3.c | 83 +++ 6 files changed, 294 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/shift_rev_3.c diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index e1ec778b10d..fa431c9c060 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -3317,6 +3317,61 @@ ;; - REVW ;; - +(define_split + [(set (match_operand:SVE_FULL_HSDI 0 "register_operand") + (rotate:SVE_FULL_HSDI + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "aarch64_constant_vector_operand")))] + "TARGET_SVE && can_create_pseudo_p ()" + [(set (match_dup 3) + (ashift:SVE_FULL_HSDI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (plus:SVE_FULL_HSDI + (lshiftrt:SVE_FULL_HSDI (match_dup 1) + (match_dup 4)) + (match_dup 3)))] + { +if (aarch64_emit_opt_vec_rotate (operands[0], operands[1], operands[2])) + DONE; + +if (!TARGET_SVE2) + FAIL; + +operands[3] = gen_reg_rtx (mode); +HOST_WIDE_INT shift_amount = + INTVAL (unwrap_const_vec_duplicate (operands[2])); +int bitwidth = GET_MODE_UNIT_BITSIZE (mode); +operands[4] = aarch64_simd_gen_const_vector_dup (mode, +bitwidth - shift_amount); + } +) + +;; The RTL combiners are able to combine "ior (ashift, ashiftrt)" to a "bswap". +;; Match that as well. +(define_insn_and_split "*v_revvnx8hi" + [(parallel +[(set (match_operand:VNx8HI 0 "register_operand") + (bswap:VNx8HI (match_operand 1 "register_operand"))) + (clobber (match_scratch:VNx8BI 2))])] + "TARGET_SVE" + "#" + "" + [(set (match_dup 0) + (unspec:VNx8HI + [(match_dup 2) + (unspec:VNx8HI +[(match_dup 1)] +UNSPEC_REVB)] + UNSPEC_PRED_X))] + { +if (!can_create_pseudo_p ()) + operands[2] = CONSTM1_RTX (VNx8BImode); +else + operands[2] = aarch64_ptrue_reg (VNx8BImode); + } +) + ;; Predicated integer unary operations. (define_insn "@aarch64_pred_" [(set (match_operand:SVE_FULL_I 0 "register_operand") diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 1da615c8955..7cdd5fda903 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -27067,11 +27067,17 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, d.op_mode = op_mode; d.op_vec_flags = aarch64_classify_vector_mode (d.op_mode); d.target = target; - d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX; + d.op0 = op0; + if (d.op0 && !register_operand (d.op0, op_mode)) +d.op0 = force_reg (op_mode, d.op0); if (op0 && d.one_vector_p) d.op1 = copy_rtx (d.op0); else -d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX; +{ + d.op1 = op1; + if (d.op1 && !register_operand (d.op1, op_mode)) + d.op1 = force_reg (op_mode, d.op1); +} d.testing_p = !target; if (!d.testing_p) diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 72dbafe5d9f..deb4e48d14f 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -6324,7 +6324,8 @@ expand_rotate_as_vec_perm (machine_mode mode, rtx dst, rtx x, rtx amt) qimode, perm_dst); if (!res) return NULL_RTX; - emit_move_insn (dst, lowpart_subreg (mode, res, qimode)); + if (!rtx_equal_p (res, perm_dst
[PATCH] [MAINTAINERS] Add myself to write after approval and DCO.
From: Dhruv Chawla Committed as 3213828f74f2f27a2dd91792cef27117ba1a522e. ChangeLog: * MAINTAINERS: Add myself to write after approval and DCO. --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 8993d176c22..f40d6350462 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -402,6 +402,7 @@ Stephane Carrez ciceron Gabriel Charettegchare Arnaud Charlet charlet Chandra Chavva - +Dhruv Chawladhruvc Dehao Chen dehao Fabien ChĂȘnefabien Bin Cheng amker @@ -932,6 +933,7 @@ information. Soumya AR +Dhruv Chawla Juergen Christ Giuseppe D'Angelo Robin Dapp -- 2.39.3 (Apple Git-146)