RE: [PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions
Hi Kyrill, It's been a while, but I believe you had the following comment about implementing CSEL: > (define_insn_and_split "*thumb2_movsicc_insn" >[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r") > (if_then_else:SI > @@ -449,17 +473,14 @@ > it\\t%d3\;mvn%d3\\t%0, #%B1 > # > # > - # > - # > - # > + ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 > + ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 > + ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 > #" > ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 > ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 > - ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 > - ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 > - ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 > ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 > - "&& reload_completed" > + "&& reload_completed && !TARGET_COND_ARITH" > > Hmm... I think the approach makes sense, but I'd rather we left the > alternatives as '#' and refine the condition so that in the > TARGET_COND_ARITH case we split in precisely the cases where the > TARGET_COND_ARITH can't handle the operands. > I appreciate that would complicate this condition somewhat, but it would have > the benefit of expressing the RTL structure to allow for further optimisation. I've made the changes you suggested, let me know if it's good to commit. Thanks, Omar -- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 950e46edf1b851b8968cbcf071564416dbf6..b8dd6af50a842c924996d528e95ce9873dcb913a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9760,7 +9760,7 @@ [(match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 3 "arm_add_operand" "rI,L")])) (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" + "TARGET_32BIT && !TARGET_COND_ARITH" "#" "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 011badc9957655a0fba67946c1db6fa6334b2bbb..57db29f92f4caee4c9384a9740e79dba2217144a 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -36,7 +36,7 @@ ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra, ;; Rg, Ri -;; in all states: Pf, Pg +;; in all states: Pf, Pg, UM, U1 ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul @@ -479,6 +479,16 @@ (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)"))) +(define_constraint "UM" + "@internal + A constraint that matches the immediate constant -1." + (match_test "op == constm1_rtx")) + +(define_constraint "U1" + "@internal + A constraint that matches the immediate constant +1." + (match_test "op == const1_rtx")) + (define_memory_constraint "Ux" "@internal In ARM/Thumb-2 state a valid address and load into CORE regs or only to diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 2144520829cc4a28cd7ac1ef528ecd54f0af13c1..5d75341c9efe82dcda27daa74d2b22c52065dd02 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -454,6 +454,13 @@ && arm_general_register_operand (op, GET_MODE (op))") (match_test "satisfies_constraint_Pg (op)"))) +(define_predicate "arm_reg_or_m1_or_1_or_zero" + (and (match_code "reg,subreg,const_int") + (ior (match_operand 0 "arm_general_register_operand") +(match_test "op == constm1_rtx") +(match_test "op == const1_rtx") +(match_test "op == const0_rtx" + ;; True for MULT, to identify which variant of shift_operator is in use. (define_special_predicate "mult_operator" (match_code "mult")) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 69460f3665b0bc7f47c307aa4ae789bab6a94f92..db0b4c53754747a915d51a6df417fa97b60828da 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -432,6 +432,30 @@ (set_attr "type" "multiple")] ) +(define_insn "*cmovsi_insn" + [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r,r,r") +(if_then_else:SI + (match_operator 1 "arm_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:SI 3 "arm_reg_or_m1_or_1_or_zero" "r, r,UM, r,U1,U1,Pz,UM,Pz") + (match_operand:SI 4 "arm_reg_or_m1_or_1_or_zero" "r,UM, r,U1, r,Pz,U1,Pz,UM")))] + "TARGET_THUMB2 && TARGET_COND_ARITH + && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx) + || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))" + "@ + csel\\t%0, %3, %4, %d1 + csinv\\t%0, %3, zr, %d1 + csinv\\t%0, %4, zr, %D1 + csinc\\t%0, %3
RE: [PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions
Hi Kyrill, > It's been a while, but I believe you had the following comment about > implementing CSEL: > >> (define_insn_and_split "*thumb2_movsicc_insn" >>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r") >> (if_then_else:SI >> @@ -449,17 +473,14 @@ >> it\\t%d3\;mvn%d3\\t%0, #%B1 >> # >> # >> - # >> - # >> - # >> + ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 >> + ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 >> + ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 >> #" >> ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 >> ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 >> - ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2 >> - ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 >> - ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 >> ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 >> - "&& reload_completed" >> + "&& reload_completed && !TARGET_COND_ARITH" >> >> Hmm... I think the approach makes sense, but I'd rather we left the >> alternatives as '#' and refine the condition so that in the >> TARGET_COND_ARITH case we split in precisely the cases where the >> TARGET_COND_ARITH can't handle the operands. >> I appreciate that would complicate this condition somewhat, but it would >> have the benefit of expressing the RTL structure to allow for further >> optimisation. > > I've made the changes you suggested, let me know if it's good to commit. > > (define_insn_and_split "*thumb2_movsicc_insn" >[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r") > (if_then_else:SI > @@ -459,7 +483,9 @@ > ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2 > ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2 > ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2 > - "&& reload_completed" > + ; Conditional arithmetic (csel etc.) can handle all alternatives except > 8-10 > + "&& reload_completed && (!TARGET_COND_ARITH || > + (which_alternative >= 8 && which_alternative <= > 10))" >[(const_int 0)] >{ > enum rtx_code rev_code; Here's an alternative implementation that doesn't use which_alternative. Thanks, Omar -- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 950e46edf1b851b8968cbcf071564416dbf6..b8dd6af50a842c924996d528e95ce9873dcb913a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9760,7 +9760,7 @@ [(match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 3 "arm_add_operand" "rI,L")])) (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" + "TARGET_32BIT && !TARGET_COND_ARITH" "#" "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 011badc9957655a0fba67946c1db6fa6334b2bbb..57db29f92f4caee4c9384a9740e79dba2217144a 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -36,7 +36,7 @@ ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra, ;; Rg, Ri -;; in all states: Pf, Pg +;; in all states: Pf, Pg, UM, U1 ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul @@ -479,6 +479,16 @@ (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)"))) +(define_constraint "UM" + "@internal + A constraint that matches the immediate constant -1." + (match_test "op == constm1_rtx")) + +(define_constraint "U1" + "@internal + A constraint that matches the immediate constant +1." + (match_test "op == const1_rtx")) + (define_memory_constraint "Ux" "@internal In ARM/Thumb-2 state a valid address and load into CORE regs or only to diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 2144520829cc4a28cd7ac1ef528ecd54f0af13c1..5d75341c9efe82dcda27daa74d2b22c52065dd02 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -454,6 +454,13 @@ && arm_general_register_operand (op, GET_MODE (op))") (match_test "satisfies_constraint_Pg (op)"))) +(define_predicate "arm_reg_or_m1_or_1_or_zero" + (and (match_code "reg,subreg,const_int") + (ior (match_operand 0 "arm_general_register_operand") +(match_test "op == constm1_rtx") +(match_test "op == const1_rtx") +(match_test "op == const0_rtx" + ;; True for MULT, to identify which variant of shift_operator is in use. (define_special_predicate "mult_operator" (match_code "mult")) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 69460f3665b0bc7f47c307aa4ae789bab6a94f92..f15f99903dad36e53d9af664c5b3a5e1ebe3b78b 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/
[PATCH 0/5][Arm] Add support for conditional instructions (CSEL, CSINC etc.) for Armv8.1-M Mainline
Hi all, This patch series provides support for the following instructions that were added in Armv8.1-M Mainline [1]: - CSEL - CSET - CSETM - CSNEG - CSINV - CSINC - CINC The patch series is organised as follows: 1) Modify default tuning for -march=armv8.1-m.main. 2) New macro, predicate and constraint. New pattern *thumb2_csinv that generates CSINV. 3) New pattern *thumb2_csinc that generates CSINC. 4) New pattern *thumb2_csneg that generates CSNEG. 5) New predicate, new constraints. New pattern *cmovsi_insn that generates CSEL, CSET, CSETM, CSINC and CSINV in specific cases. CINV and CNEG aren't used as they are aliases for CSINV and CSNEG respectively. There is one place CINC is used, as an optimisation in an existing pattern. Some existing patterns are modified to force the new patterns to be used when appropriate and to prevent undesirable "optimisations". For example, often `if_then_else` insns are split into `cond_exec` insns (see *compare_scc in arm.md). This makes it harder to generate instructions like CSEL, so this behaviour is disabled when targting Armv8.1-M Mainline. The combine and ifcvt passes also cause problems, for example *thumb2_movcond which can cause unwanted combines. In some cases the define_insn is disabled, in others only splitting is disabled. Along with matching the obvious cases, some edge cases are taken advantage of to slightly optimise code generation. For example, R1 = CC ? 1 : R0 can take advantage of the zero register to generate CSINC R1, ZR, R0. There are a few cases where CSEL etc. could be used, but it's more cumbersome to do so, therefore the default IT block implementation is kept (see *thumb2_movsicc_insn alts 8-10). In general however, code generated on Armv8.1-M Mainline will see a large decrease in the number of IT blocks. Entire patch series together regression tested on arm-none-eabi and arm-none-linux-gnueabi with no regressions, with a minor performance improvement (-0.1% cycle count) on a proprietary benchmark. Thanks, Omar [1] https://static.docs.arm.com/ddi0553/bf/DDI0553B_f_armv8m_arm.pdf csel.patch Description: csel.patch
[PATCH 1/5][Arm] Modify default tuning of armv8.1-m.main to use Cortex-M55
Previously, compiling with -march=armv8.1-m.main would tune for Cortex-M7. However, the Cortex-M7 only supports up to Armv7e-M. The Cortex-M55 is the earliest CPU that supports Armv8.1-M Mainline so is more appropriate. This also has the effect of changing the branch cost function used, which will be necessary to correctly prioritise conditional instructions over branches in the rest of this patch series. Regression tested on arm-none-eabi. gcc/ChangeLog: 2020-07-30: Omar Tahir * config/arm/arm-cpus.in (armv8.1-m.main): Tune for Cortex-M55. diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in index 728be500b80..c98f8ede8fd 100644 --- a/gcc/config/arm/arm-cpus.in +++ b/gcc/config/arm/arm-cpus.in @@ -716,7 +716,7 @@ begin arch armv8-r end arch armv8-r begin arch armv8.1-m.main - tune for cortex-m7 + tune for cortex-m55 tune flags CO_PROC base 8M_MAIN profile M csel_1.patch Description: csel_1.patch
[PATCH 2/5][Arm] New pattern for CSINV instructions
This patch adds a new pattern, *thumb2_csinv, for generating CSINV nstructions. This pattern relies on a few general changes that will be used throughout the following patches: - A new macro, TARGET_COND_ARITH, which is only true on 8.1-M Mainline and represents the existence of these conditional instructions. - A change to the cond exec hook, arm_have_conditional_execution, which now returns false if TARGET_COND_ARITH before reload. This allows for some ifcvt transformations when they would usually be disabled. I've written a rather verbose comment (with the risk of over-explaining) as it's a bit of a confusing change. - One new predicate and one new constraint. - *thumb2_movcond has been restricted to only match if !TARGET_COND_ARITH, otherwise it triggers undesirable combines. 2020-07-30: Sudakshina Das Omar Tahir * config/arm/arm.h (TARGET_COND_ARITH): New macro. * config/arm/arm.c (arm_have_conditional_execution): Return false if TARGET_COND_ARITH before reload. * config/arm/constraints.md: (Z): New constant zero. * config/arm/predicates.md(arm_comparison_operation): Returns true if comparing CC_REGNUM with constant zero. * config/arm/thumb2.md (*thumb2_csinv): New. (*thumb2_movcond): Don't match if TARGET_COND_ARITH. Regression tested on arm-none-eabi. gcc/testsuite/ChangeLog: 2020-07-30: Sudakshina Das Omar Tahir * gcc.target/arm/csinv-1.c: New test. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index dac9a6fb5c4..3a9684cdcd8 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -29833,12 +29833,20 @@ arm_frame_pointer_required (void) return false; } -/* Only thumb1 can't support conditional execution, so return true if - the target is not thumb1. */ static bool arm_have_conditional_execution (void) { - return !TARGET_THUMB1; + bool has_cond_exec, enable_ifcvt_trans; + + /* Only THUMB1 cannot support conditional execution. */ + has_cond_exec = !TARGET_THUMB1; + + /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt + transformations before reload. */ + enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed; + + /* The ifcvt transformations are only turned on if we return false. */ + return has_cond_exec && !enable_ifcvt_trans; } /* The AAPCS sets the maximum alignment of a vector to 64 bits. */ diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 30e1d6dc994..d67c91796e4 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -177,6 +177,10 @@ emission of floating point pcs attributes. */ #define TARGET_CRC32 (arm_arch_crc) +/* Thumb-2 but also has some conditional arithmetic instructions like csinc, + csinv, etc. */ +#define TARGET_COND_ARITH(arm_arch8_1m_main) + /* The following two macros concern the ability to execute coprocessor instructions for VFPv3 or NEON. TARGET_VFP3/TARGET_VFPD32 are currently only ever tested when we know we are generating for VFP hardware; we need diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 011badc9957..048b25ef4a1 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -28,6 +28,7 @@ ;; The following normal constraints have been used: ;; in ARM/Thumb-2 state: G, I, j, J, K, L, M ;; in Thumb-1 state: I, J, K, L, M, N, O +;; in all states: Z ;; 'H' was previously used for FPA. ;; The following multi-letter normal constraints have been used: @@ -479,6 +480,11 @@ (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)"))) +(define_constraint "Z" + "@internal + Integer constant zero." + (match_test "op == const0_rtx")) + (define_memory_constraint "Ux" "@internal In ARM/Thumb-2 state a valid address and load into CORE regs or only to diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 981eec520ba..2144520829c 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -485,6 +485,18 @@ (and (match_operand 0 "expandable_comparison_operator") (match_test "maybe_get_arm_condition_code (op) != ARM_NV"))) +(define_special_predicate "arm_comparison_operation" + (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered, + ordered,unlt,unle,unge,ungt") +{ + if (XEXP (op, 1) != const0_rtx) +return false; + rtx op0 = XEXP (op, 0); + if (!REG_P (op0) || REGNO (op0) != CC_REGNUM) +return false; + return maybe_get_arm_condit
[PATCH 3/5][Arm] New pattern for CSINC instructions
This patch adds a new pattern, *thumb2_csinc, for generating CSINC instructions. It also modifies an existing pattern, *thumb2_cond_arith, to output CINC when the operation is an addition and TARGET_COND_ARITH is true. Regression tested on arm-none-eabi. 2020-07-30: Sudakshina Das Omar Tahir * config/arm/thumb2.md (*thumb2_csinc): New. (*thumb2_cond_arith): Generate CINC where possible. gcc/testsuite/ChangeLog: 2020-07-30: Sudakshina Das Omar Tahir * gcc.target/arm/csinc-1.c: New test. diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 0b00aef7ef7..79cf684e5cb 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -743,6 +743,9 @@ if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx) return \"%i5\\t%0, %1, %2, lsr #31\"; +if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH) + return \"cinc\\t%0, %1, %d4\"; + output_asm_insn (\"cmp\\t%2, %3\", operands); if (GET_CODE (operands[5]) == AND) { @@ -952,6 +955,21 @@ (set_attr "predicable" "no")] ) +(define_insn "*thumb2_csinc" + [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r") + (if_then_else:SI +(match_operand 1 "arm_comparison_operation" "") +(plus:SI (match_operand:SI 2 "arm_general_register_operand" "r, r") + (const_int 1)) +(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))] + "TARGET_COND_ARITH" + "@ + csinc\\t%0, %3, %2, %D1 + csinc\\t%0, zr, %2, %D1" + [(set_attr "type" "csel") + (set_attr "predicable" "no")] +) + (define_insn "*thumb2_movcond" [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts") (if_then_else:SI diff --git a/gcc/testsuite/gcc.target/arm/csinc-1.c b/gcc/testsuite/gcc.target/arm/csinc-1.c new file mode 100644 index 000..b9928493862 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/csinc-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */ +/* { dg-options "-O2 -march=armv8.1-m.main" } */ + +int +test_csinc32_condasn1(int w0, int w1, int w2, int w3) +{ + int w4; + + /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*ne" } } */ + w4 = (w0 == w1) ? (w2 + 1) : w3; + return w4; +} + +int +test_csinc32_condasn2(int w0, int w1, int w2, int w3) +{ + int w4; + + /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*eq" } } */ + w4 = (w0 == w1) ? w3 : (w2 + 1); + return w4; +} csel_3.patch Description: csel_3.patch
[PATCH 4/5][Arm] New pattern for CSNEG instructions
This patch adds a new pattern, *thumb2_csneg, for generating CSNEG instructions. It also restricts *if_neg_move and *thumb2_negscc to only match if !TARGET_COND_ARITH which prevents undesirable matches during ifcvt. Regression tested on arm-none-eabi. 2020-07-30: Sudakshina Das Omar Tahir * config/arm/thumb2.md (*thumb2_csneg): New. (*thumb2_negscc): Don't match if TARGET_COND_ARITH. * config/arm/arm.md (*if_neg_move): Don't match if TARGET_COND_ARITH. gcc/testsuite/ChangeLog: 2020-07-30: Sudakshina Das Omar Tahir * gcc.target/arm/csneg.c: New test. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index a6a31f8f4ef..950e46edfee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -11211,7 +11211,7 @@ [(match_operand 3 "cc_register" "") (const_int 0)]) (neg:SI (match_operand:SI 2 "s_register_operand" "l,r")) (match_operand:SI 1 "s_register_operand" "0,0")))] - "TARGET_32BIT" + "TARGET_32BIT && !TARGET_COND_ARITH" "#" "&& reload_completed" [(cond_exec (match_op_dup 4 [(match_dup 3) (const_int 0)]) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 79cf684e5cb..d12467d7644 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -880,7 +880,7 @@ [(match_operand:SI 1 "s_register_operand" "r") (match_operand:SI 2 "arm_rhs_operand" "rI")]))) (clobber (reg:CC CC_REGNUM))] - "TARGET_THUMB2" + "TARGET_THUMB2 && !TARGET_COND_ARITH" "#" "&& reload_completed" [(const_int 0)] @@ -970,6 +970,20 @@ (set_attr "predicable" "no")] ) +(define_insn "*thumb2_csneg" + [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r") + (if_then_else:SI +(match_operand 1 "arm_comparison_operation" "") +(neg:SI (match_operand:SI 2 "arm_general_register_operand" "r, r")) +(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))] + "TARGET_COND_ARITH" + "@ + csneg\\t%0, %3, %2, %D1 + csneg\\t%0, zr, %2, %D1" + [(set_attr "type" "csel") + (set_attr "predicable" "no")] +) + (define_insn "*thumb2_movcond" [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts") (if_then_else:SI diff --git a/gcc/testsuite/gcc.target/arm/csneg.c b/gcc/testsuite/gcc.target/arm/csneg.c new file mode 100644 index 000..e48606265af --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/csneg.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */ +/* { dg-options "-O2 -march=armv8.1-m.main" } */ + +int +test_csneg32_condasn1(int w0, int w1, int w2, int w3) +{ + int w4; + + /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */ + w4 = (w0 == w1) ? -w2 : w3; + return w4; +} + +int +test_csneg32_condasn2(int w0, int w1, int w2, int w3) +{ + int w4; + + /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*eq" } } */ + w4 = (w0 == w1) ? w3 : -w2; + return w4; +} + +unsigned long long +test_csneg_uxtw (unsigned int a, unsigned int b, unsigned int c) +{ + unsigned int val; + + /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */ + val = a ? b : -c; + return val; +} csel_4.patch Description: csel_4.patch
[PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions
This patch adds a new pattern, *cmovsi_insn, for generating CSEL, CSET and CSETM instructions. It also generates CSINV and CSINC instructions in specific cases where one of the operands is constant. To facilitate this, one new predicate and two new constraints are added, and *compare_scc is restricted to only match if !TARGET_COND_ARITH to prevent an unwanted split. Additionally, alternatives 8-10 are re-enabled in *thumnb2_movsicc_insn and splitting only occurs if !TARGET_COND_ARITH. This forces the new pattern to be used when possible, but for more complex cases that can't directly use CSEL it falls back to using IT blocks. Regression tested on arm-none-eabi. The entire patch set was regression tested on arm-linux-gnueabi also. That's all folks! Thanks, Omar 2020-07-30: Sudakshina Das Omar Tahir * config/arm/thumb2.md (*cmovsi_insn): New. (*thumb2_movsicc_insn): Don't split if TARGET_COND_ARITH, re-enable alternatives 8-10. * config/arm/arm.md (*compare_scc): Don't match if TARGET_COND_ARITH. gcc/testsuite/ChangeLog: 2020-07-30: Omar Tahir * gcc.target/arm/csel.c: New test. * gcc.target/arm/cset.c: New test. * gcc.target/arm/csetm.c: New test. * gcc.target/arm/csinv-2.c: New test. * gcc.target/arm/csinc-2.c: New test. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 950e46edfee..b8dd6af50a8 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9760,7 +9760,7 @@ [(match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 3 "arm_add_operand" "rI,L")])) (clobber (reg:CC CC_REGNUM))] - "TARGET_32BIT" + "TARGET_32BIT && !TARGET_COND_ARITH" "#" "&& reload_completed" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3))) diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 048b25ef4a1..37d240d139b 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -37,7 +37,7 @@ ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe ;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra, ;; Rg, Ri -;; in all states: Pf, Pg +;; in all states: Pf, Pg, UM, U1 ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul @@ -485,6 +485,16 @@ Integer constant zero." (match_test "op == const0_rtx")) +(define_constraint "UM" + "@internal + A constraint that matches the immediate constant -1." + (match_test "op == constm1_rtx")) + +(define_constraint "U1" + "@internal + A constraint that matches the immediate constant +1." + (match_test "op == const1_rtx")) + (define_memory_constraint "Ux" "@internal In ARM/Thumb-2 state a valid address and load into CORE regs or only to diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 2144520829c..5d75341c9ef 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -454,6 +454,13 @@ && arm_general_register_operand (op, GET_MODE (op))") (match_test "satisfies_constraint_Pg (op)"))) +(define_predicate "arm_reg_or_m1_or_1_or_zero" + (and (match_code "reg,subreg,const_int") + (ior (match_operand 0 "arm_general_register_operand") +(match_test "op == constm1_rtx") +(match_test "op == const1_rtx") +(match_test "op == const0_rtx" + ;; True for MULT, to identify which variant of shift_operator is in use. (define_special_predicate "mult_operator" (match_code "mult")) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index d12467d7644..bc6f2a52004 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -432,6 +432,30 @@ (set_attr "type" "multiple")] ) +(define_insn "*cmovsi_insn" + [(set (match_operand:SI 0 "arm_general_register_operand" "=r,r,r,r,r,r,r,r,r") +(if_then_else:SI + (match_operator 1 "arm_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:SI 3 "arm_reg_or_m1_or_1_or_zero" "r, r,UM, r,U1,U1, Z,UM, Z") + (match_operand:SI 4 "arm_reg_or_m1_or_1_or_zero" "r,UM, r,U1, r, Z,U1, Z,UM")))] + "TARGET_THUMB2 && TARGET_COND_ARITH + && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx) + || (operands[3] ==
RE: [PATCH 2/5][Arm] New pattern for CSINV instructions
Hi Kyrill, > -/* Only thumb1 can't support conditional execution, so return true if > - the target is not thumb1. */ > static bool > > > Functions should have comments in GCC. Can you please write something > describing the new logic of the function. > > arm_have_conditional_execution (void) > { > - return !TARGET_THUMB1; > + bool has_cond_exec, enable_ifcvt_trans; > + > + /* Only THUMB1 cannot support conditional execution. */ > + has_cond_exec = !TARGET_THUMB1; > + > + /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt > + transformations before reload. */ > + enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed; > + > + /* The ifcvt transformations are only turned on if we return false. */ > + return has_cond_exec && !enable_ifcvt_trans; > > I don't think that comment is very useful. Perhaps "Enable ifcvt > transformations only if..." > > } Fixed, let me know if the new comments are a bit clearer now. > +(define_constraint "Z" > + "@internal > + Integer constant zero." > + (match_test "op == const0_rtx")) > > > We're usually wary of adding more constraints unless necessary as it gets > complicated to read patterns quickly (especially once we get into > multi-letter constraints). > I think you can reuse the existing "Pz" constraint for your purposes. Yes Pz works, I'll replace Z with Pz in the other patches as well. In patch 5 I introduce UM (-1) and U1 (1), I don't think there's any existing combination of constraints that can be used instead. > > Ok with those changes. > If you'd like to commit it yourself please apply for write access at > https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my email address from > MAINTAINERS as the approver. Excellent, thanks. If the other three patches are okay I'll commit them as well? Thanks, Omar --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index dac9a6fb5c4..e1bb2db9c8a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -29833,12 +29833,23 @@ arm_frame_pointer_required (void) return false; } -/* Only thumb1 can't support conditional execution, so return true if - the target is not thumb1. */ +/* Implement the TARGET_HAVE_CONDITIONAL_EXECUTION hook. + All modes except THUMB1 have conditional execution. + If we have conditional arithmetic, return false before reload to + enable some ifcvt transformations. */ static bool arm_have_conditional_execution (void) { - return !TARGET_THUMB1; + bool has_cond_exec, enable_ifcvt_trans; + + /* Only THUMB1 cannot support conditional execution. */ + has_cond_exec = !TARGET_THUMB1; + + /* Enable ifcvt transformations if we have conditional arithmetic, but only + before reload. */ + enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed; + + return has_cond_exec && !enable_ifcvt_trans; } /* The AAPCS sets the maximum alignment of a vector to 64 bits. */ diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 30e1d6dc994..d67c91796e4 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -177,6 +177,10 @@ emission of floating point pcs attributes. */ #define TARGET_CRC32 (arm_arch_crc) +/* Thumb-2 but also has some conditional arithmetic instructions like csinc, + csinv, etc. */ +#define TARGET_COND_ARITH (arm_arch8_1m_main) + /* The following two macros concern the ability to execute coprocessor instructions for VFPv3 or NEON. TARGET_VFP3/TARGET_VFPD32 are currently only ever tested when we know we are generating for VFP hardware; we need diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 981eec520ba..2144520829c 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -485,6 +485,18 @@ (and (match_operand 0 "expandable_comparison_operator") (match_test "maybe_get_arm_condition_code (op) != ARM_NV"))) +(define_special_predicate "arm_comparison_operation" + (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered, + ordered,unlt,unle,unge,ungt") +{ + if (XEXP (op, 1) != const0_rtx) +return false; + rtx op0 = XEXP (op, 0); + if (!REG_P (op0) || REGNO (op0) != CC_REGNUM) +return false; + return maybe_get_arm_condition_code (op) != ARM_NV; +}) + (define_special_predicate "lt_ge_comparison_operator" (match_code "lt,ge")) diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 793f6706868..ecc903970db 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -938,6 +938,20 @@ (set_attr "type" "multiple")] ) +(define_insn "*thumb2_csinv" + [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r") + (if_then_else:SI +(match_operand 1 "arm_comparison_operation" "") +(not:SI (match_operand:SI 2 "arm_general_register_operand" "r, r")) +(match_operand:SI 3 "reg_or_zero_operand" "r, Pz")))] + "TARGET_COND_ARITH" + "@ + csinv\\t%0, %3, %2, %D1 + csinv\\t%0, zr, %2, %D1" + [(set_attr "type"
RE: [PATCH 3/5][Arm] New pattern for CSINC instructions
> Hi Omar, > > diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md > index 0b00aef7ef7..79cf684e5cb 100644 > --- a/gcc/config/arm/thumb2.md > +++ b/gcc/config/arm/thumb2.md > @@ -743,6 +743,9 @@ > if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx) >return \"%i5\\t%0, %1, %2, lsr #31\"; > > +if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH) > + return \"cinc\\t%0, %1, %d4\"; > + > output_asm_insn (\"cmp\\t%2, %3\", operands); > > > Hmmm, this looks wrong. The pattern needs to perform the comparison (setting > the CC reg) as well as do the conditional increment. > Emitting a cinc without a cmp won't set the CC flags. > Also, cinc increments only by 1, whereas the "arm_rhs_operand" predicate > accepts a wider variety of immediates, so just checking for GET_CODE > (operands[5]) == PLUS isn't enough. > > Thanks, > Kyrill > My bad, the following line output_asm_insn (\"cmp\\t%2, %3\", operands); should be before my change rather than after, that will generate the cmp needed. As for the predicate accepting other immediates, I don't think that's an issue. From what I understand, the pattern represents r0 = f5 (f4 (r2, r3), r1) where f5 is a shiftable operator and f4 is a comparison operator. For simplicity let's just assume f4 is ==. Then we have r0 = f5 (r1, r2 == r3) If f5 is PLUS then we get r0 = r1 + (r2 == r3) which is r0 = (r2 == r3) ? r1 + 1 : r1 i.e. cmp r2, r3 \\ cinc r0, r1, eq. Since all comparisons return either zero (comparison failed) or 1 (comparison passed) a cinc should always work as long as the shiftable operator is PLUS. Operand 3 being an "arm_rhs_operand" shouldn't matter since it's just being compared to operand 2 and returning a 0 or 1. Thanks, Omar
[PATCH] aarch64: Fix missing BTI instruction in trampolines
Hi, Got a small bugfix here regarding BTIs and trampolines. If two functions require trampolines, and the first has BTI enabled while the second doesn't, the generated template will be lacking a BTI instruction. This patch fixes this by always adding a BTI instruction, which is safe as BTI instructions are ignored on unsupported architecture versions. I don't have write access, so could someone commit for me? Bootstrapped and tested on aarch64 with no regressions. gcc/ChangeLog: 2020-06-29 Omar Tahir omar.ta...@arm.com * config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Always generate a BTI instruction. gcc/testsuite/ChangeLog: 2020-06-29 Omar Tahir omar.ta...@arm.com * gcc.target/aarch64/bti-4.c: New test. trampoline_bti.patch Description: trampoline_bti.patch
[PATCH] Fix unnecessary register spill that depends on function ordering
Hi, The variables first_moveable_pseudo and last_moveable_pseudo aren't reset after compiling a function, which means they leak into the first scheduler pass of the following function. In some cases, this can cause an extra spill during register allocation of the second function. This patch fixes this by setting first_moveable_pseudo = last_moveable_pseudo at the beginning of the first scheduler pass. Because the spill occurs in the middle of the IRA pass it is highly target-sensitive, so I have provided a test case that works on aarch64. There doesn't seem to be a target-independent way to trigger this bug, since the RTL at the IRA stage is already quite target-specific. Interestingly, the aarch64 test case here doesn't actually perform a complete register spill - the value is stored on the stack but never retrieved. Perhaps this points to another, deeper bug? Bootstrapped and regression tested on aarch64-none-linux-gnu. I don't have write privileges, so if it's fine could someone push for me? Thanks, Omar gcc/ChangeLog: 2020-06-30: Omar Tahir * sched-rgn.c: Include ira-int.h, ira.h, regs.h. (rest_of_handle_sched): Clear moveable pseudo range. gcc/testsuite/ChangeLog: 2020-06-30: Omar Tahir * gcc.target/aarch64/nospill.c: New test. diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c index 7f5dfdb..51329fc 100644 --- a/gcc/sched-rgn.c +++ b/gcc/sched-rgn.c @@ -65,6 +65,9 @@ along with GCC; see the file COPYING3. If not see #include "dbgcnt.h" #include "pretty-print.h" #include "print-rtl.h" +#include "ira.h" +#include "regs.h" +#include "ira-int.h" /* Disable warnings about quoting issues in the pp_xxx calls below that (intentionally) don't follow GCC diagnostic conventions. */ @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) { #ifdef INSN_SCHEDULING + first_moveable_pseudo = last_moveable_pseudo; if (flag_selective_scheduling && ! maybe_skip_selective_scheduling ()) run_selective_scheduling (); diff --git a/gcc/testsuite/gcc.target/aarch64/nospill.c b/gcc/testsuite/gcc.target/aarch64/nospill.c new file mode 100644 index 000..60399d8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/nospill.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +/* The pseudo for P is marked as moveable in the IRA pass. */ +float +func_0 (float a, float b, float c) +{ + float p = c / a; + + if (b > 1) +{ + b /= p; + if (c > 2) +a /= 3; +} + + return b / c * a; +} + +/* If first_moveable_pseudo and last_moveable_pseudo are not reset correctly, + they will carry over and spill the pseudo for Q. */ +float +func_1 (float a, float b, float c) +{ + float q = a + b; + + c *= a / (b + b); + if (a > 0) +c *= q; + + return a * b * c; +} + +/* We have plenty of spare registers, so check nothing has been spilled. */ +/* { dg-final { scan-assembler-not "str" } } */ moveable_pseudo.patch Description: moveable_pseudo.patch
RE: [PATCH] Fix unnecessary register spill that depends on function ordering
Hi Richard, From: Richard Sandiford > > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) > > { #ifdef INSN_SCHEDULING > > + first_moveable_pseudo = last_moveable_pseudo; > >if (flag_selective_scheduling > >&& ! maybe_skip_selective_scheduling ()) > > run_selective_scheduling (); > > I think instead we should zero out both variables at the end of IRA. > There are other places besides the scheduler that call into the IRA code, so > tackling the problem there seems more general. If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then they'll be zero for the second scheduler pass, which uses them. In fact, I've just realised that the GCSE and move_loop_invariants passes also use them (they both call ira_set_pseudo_classes which calls find_costs_and_classes which uses these variables). They need to be zeroed or set equal to each other before the first pass that uses them (move_loop_invariants), but kept alive until after the last pass that uses them (GCSE). So if there's a function that sets things up right before the RTL passes start then I think that's a good location candidate. > > +/* We have plenty of spare registers, so check nothing has been > > +spilled. */ > > +/* { dg-final { scan-assembler-not "str" } } */ > > The testcase looks good, but it's probably better to make that “\tstr\t”. > The problem with plain “str” is that assembly output can often include > pathnames and version strings, and it's possible that one of those could > contain “str”. Good catch, I'll keep that tip in mind for future! Thanks, Omar
RE: [PATCH] Fix unnecessary register spill that depends on function ordering
> Omar Tahir writes: > > Hi Richard, > > > > From: Richard Sandiford > >> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) > >> > { #ifdef INSN_SCHEDULING > >> > + first_moveable_pseudo = last_moveable_pseudo; > >> >if (flag_selective_scheduling > >> >&& ! maybe_skip_selective_scheduling ()) > >> > run_selective_scheduling (); > >> > >> I think instead we should zero out both variables at the end of IRA. > >> There are other places besides the scheduler that call into the IRA code, > >> so tackling the problem there seems more general. > > > > If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then > > they'll be zero for the second scheduler pass, which uses them. > > Are you sure? It shouldn't be doing that, since there are no pseudos > left when the second scheduling pass runs. RA replaces all pseudos > with hard registers. > > So if the values in the variables has a noticeable effect on sched2, > I think that's even more reason to clear them after IRA :-) That's a good point. A few other passes call functions that make use of the moveable pseudo variables. But if they're before IRA then they should be zero, and as you say if they're after IRA then there are no pseudos left! I've moved the reset to the end of move_unallocated_pseudos. Unfortunately I can't inline the patch as there's a form feed (^L) that's mangling the text, not sure how to get around that. Thanks, Omar gcc/ChangeLog: 2020-07-02: Omar Tahir * ira.c (move_unallocated_pseudos): Zero first_moveable_pseudo and last_moveable_pseudo before returning. gcc/testsuite/ChangeLog: 2020-07-02: Omar Tahir * gcc.target/aarch64/nospill.c: New test. moveable_pseudo.patch Description: moveable_pseudo.patch