RE: [PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions

2020-09-16 Thread Omar Tahir
Hi Kyrill,

It's been a while, but I believe you had the following comment about 
implementing CSEL:

> (define_insn_and_split "*thumb2_movsicc_insn"
>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r")
> (if_then_else:SI
> @@ -449,17 +473,14 @@
> it\\t%d3\;mvn%d3\\t%0, #%B1
> #
> #
> -   #
> -   #
> -   #
> +   ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
> +   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
> +   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
> #"
> ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
> ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
> -   ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
> -   ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
> -   ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
> ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
> -  "&& reload_completed"
> +  "&& reload_completed && !TARGET_COND_ARITH"
>
> Hmm... I think the approach makes sense, but I'd rather we left the 
> alternatives as '#'  and refine the condition so that in the 
> TARGET_COND_ARITH case we split in precisely the cases where the 
> TARGET_COND_ARITH can't handle the operands.
> I appreciate that would complicate this condition somewhat, but it would have 
> the benefit of expressing the RTL structure to allow for further optimisation.

I've made the changes you suggested, let me know if it's good to commit.

Thanks,
Omar

--

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
950e46edf1b851b8968cbcf071564416dbf6..b8dd6af50a842c924996d528e95ce9873dcb913a
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9760,7 +9760,7 @@
[(match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 3 "arm_add_operand" "rI,L")]))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3)))
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
011badc9957655a0fba67946c1db6fa6334b2bbb..57db29f92f4caee4c9384a9740e79dba2217144a
 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
;; Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pf, Pg, UM, U1
 ;; The following memory constraints have been used:
;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -479,6 +479,16 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))
+(define_constraint "UM"
+  "@internal
+   A constraint that matches the immediate constant -1."
+  (match_test "op == constm1_rtx"))
+
+(define_constraint "U1"
+  "@internal
+   A constraint that matches the immediate constant +1."
+  (match_test "op == const1_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 
2144520829cc4a28cd7ac1ef528ecd54f0af13c1..5d75341c9efe82dcda27daa74d2b22c52065dd02
 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -454,6 +454,13 @@
   && arm_general_register_operand (op, 
GET_MODE (op))")
(match_test "satisfies_constraint_Pg (op)")))
+(define_predicate "arm_reg_or_m1_or_1_or_zero"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "arm_general_register_operand")
+(match_test "op == constm1_rtx")
+(match_test "op == const1_rtx")
+(match_test "op == const0_rtx"
+
;; True for MULT, to identify which variant of shift_operator is in use.
(define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
69460f3665b0bc7f47c307aa4ae789bab6a94f92..db0b4c53754747a915d51a6df417fa97b60828da
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -432,6 +432,30 @@
(set_attr "type" "multiple")]
)
+(define_insn "*cmovsi_insn"
+  [(set (match_operand:SI 0 "arm_general_register_operand" 
"=r,r,r,r,r,r,r,r,r")
+(if_then_else:SI
+ (match_operator 1 "arm_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)])
+ (match_operand:SI 3 "arm_reg_or_m1_or_1_or_zero" "r, r,UM, 
r,U1,U1,Pz,UM,Pz")
+ (match_operand:SI 4 "arm_reg_or_m1_or_1_or_zero" "r,UM, r,U1, 
r,Pz,U1,Pz,UM")))]
+  "TARGET_THUMB2 && TARGET_COND_ARITH
+   && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
+   || (operands[3] == constm1_rtx && operands[4] == const1_rtx)))"
+  "@
+   csel\\t%0, %3, %4, %d1
+   csinv\\t%0, %3, zr, %d1
+   csinv\\t%0, %4, zr, %D1
+   csinc\\t%0, %3

RE: [PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions

2020-09-17 Thread Omar Tahir
Hi Kyrill,

> It's been a while, but I believe you had the following comment about 
> implementing CSEL:
>
>> (define_insn_and_split "*thumb2_movsicc_insn"
>>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r")
>>   (if_then_else:SI
>> @@ -449,17 +473,14 @@
>> it\\t%d3\;mvn%d3\\t%0, #%B1
>> #
>> #
>> -   #
>> -   #
>> -   #
>> +   ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
>> +   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
>> +   ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
>> #"
>> ; alt 6: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
>> ; alt 7: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
>> -   ; alt 8: ite\\t%d3\;mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
>> -   ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
>> -   ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
>> ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
>> -  "&& reload_completed"
>> +  "&& reload_completed && !TARGET_COND_ARITH"
>>
>> Hmm... I think the approach makes sense, but I'd rather we left the 
>> alternatives as '#'  and refine the condition so that in the 
>> TARGET_COND_ARITH case we split in precisely the cases where the 
>> TARGET_COND_ARITH can't handle the operands.
>> I appreciate that would complicate this condition somewhat, but it would 
>> have the benefit of expressing the RTL structure to allow for further 
>> optimisation.
>
> I've made the changes you suggested, let me know if it's good to commit.
>
>  (define_insn_and_split "*thumb2_movsicc_insn"
>[(set (match_operand:SI 0 "s_register_operand" "=l,l,r,r,r,r,r,r,r,r,r,r")
> (if_then_else:SI
> @@ -459,7 +483,9 @@
> ; alt 9: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
> ; alt 10: ite\\t%d3\;mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
> ; alt 11: ite\\t%d3\;mov%d3\\t%0, %1\;mov%D3\\t%0, %2
> -  "&& reload_completed"
> +   ; Conditional arithmetic (csel etc.) can handle all alternatives except 
> 8-10
> +  "&& reload_completed && (!TARGET_COND_ARITH ||
> +   (which_alternative >= 8 && which_alternative <= 
> 10))"
>[(const_int 0)]
>{
>  enum rtx_code rev_code;

Here's an alternative implementation that doesn't use which_alternative.

Thanks,
Omar

--

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
950e46edf1b851b8968cbcf071564416dbf6..b8dd6af50a842c924996d528e95ce9873dcb913a
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9760,7 +9760,7 @@
[(match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 3 "arm_add_operand" "rI,L")]))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3)))
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
011badc9957655a0fba67946c1db6fa6334b2bbb..57db29f92f4caee4c9384a9740e79dba2217144a
 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -36,7 +36,7 @@
;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
;; Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pf, Pg, UM, U1
 ;; The following memory constraints have been used:
;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -479,6 +479,16 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))
+(define_constraint "UM"
+  "@internal
+   A constraint that matches the immediate constant -1."
+  (match_test "op == constm1_rtx"))
+
+(define_constraint "U1"
+  "@internal
+   A constraint that matches the immediate constant +1."
+  (match_test "op == const1_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 
2144520829cc4a28cd7ac1ef528ecd54f0af13c1..5d75341c9efe82dcda27daa74d2b22c52065dd02
 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -454,6 +454,13 @@
   && arm_general_register_operand (op, 
GET_MODE (op))")
(match_test "satisfies_constraint_Pg (op)")))
+(define_predicate "arm_reg_or_m1_or_1_or_zero"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "arm_general_register_operand")
+(match_test "op == constm1_rtx")
+(match_test "op == const1_rtx")
+(match_test "op == const0_rtx"
+
;; True for MULT, to identify which variant of shift_operator is in use.
(define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
69460f3665b0bc7f47c307aa4ae789bab6a94f92..f15f99903dad36e53d9af664c5b3a5e1ebe3b78b
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/

[PATCH 0/5][Arm] Add support for conditional instructions (CSEL, CSINC etc.) for Armv8.1-M Mainline

2020-08-04 Thread Omar Tahir
Hi all,

This patch series provides support for the following instructions that were
added in Armv8.1-M Mainline [1]:
- CSEL
- CSET
- CSETM
- CSNEG
- CSINV
- CSINC
- CINC

The patch series is organised as follows:
1) Modify default tuning for -march=armv8.1-m.main.
2) New macro, predicate and constraint. New pattern *thumb2_csinv that
   generates CSINV.
3) New pattern *thumb2_csinc that generates CSINC.
4) New pattern *thumb2_csneg that generates CSNEG.
5) New predicate, new constraints. New pattern *cmovsi_insn that generates
   CSEL, CSET, CSETM, CSINC and CSINV in specific cases.

CINV and CNEG aren't used as they are aliases for CSINV and CSNEG
respectively. There is one place CINC is used, as an optimisation in an
existing pattern.

Some existing patterns are modified to force the new patterns to be used
when appropriate and to prevent undesirable "optimisations". For example,
often `if_then_else` insns are split into `cond_exec` insns (see *compare_scc
in arm.md). This makes it harder to generate instructions like CSEL, so this
behaviour is disabled when targting Armv8.1-M Mainline. The combine and ifcvt
passes also cause problems, for example *thumb2_movcond which can cause
unwanted combines. In some cases the define_insn is disabled, in others only
splitting is disabled.

Along with matching the obvious cases, some edge cases are taken advantage of
to slightly optimise code generation. For example, R1 = CC ? 1 : R0 can take
advantage of the zero register to generate CSINC R1, ZR, R0.

There are a few cases where CSEL etc. could be used, but it's more cumbersome
to do so, therefore the default IT block implementation is kept (see
*thumb2_movsicc_insn alts 8-10). In general however, code generated on
Armv8.1-M Mainline will see a large decrease in the number of IT blocks.

Entire patch series together regression tested on arm-none-eabi and
arm-none-linux-gnueabi with no regressions, with a minor performance
improvement (-0.1% cycle count) on a proprietary benchmark.

Thanks,
Omar

[1] https://static.docs.arm.com/ddi0553/bf/DDI0553B_f_armv8m_arm.pdf



csel.patch
Description: csel.patch


[PATCH 1/5][Arm] Modify default tuning of armv8.1-m.main to use Cortex-M55

2020-08-04 Thread Omar Tahir
Previously, compiling with -march=armv8.1-m.main would tune for Cortex-M7.
However, the Cortex-M7 only supports up to Armv7e-M. The Cortex-M55 is the
earliest CPU that supports Armv8.1-M Mainline so is more appropriate. This
also has the effect of changing the branch cost function used, which will be
necessary to correctly prioritise conditional instructions over branches in
the rest of this patch series.

Regression tested on arm-none-eabi.


gcc/ChangeLog:

2020-07-30: Omar Tahir 

* config/arm/arm-cpus.in (armv8.1-m.main): Tune for Cortex-M55.


diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index 728be500b80..c98f8ede8fd 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -716,7 +716,7 @@ begin arch armv8-r
end arch armv8-r
 begin arch armv8.1-m.main
- tune for cortex-m7
+ tune for cortex-m55
  tune flags CO_PROC
  base 8M_MAIN
  profile M


csel_1.patch
Description: csel_1.patch


[PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-04 Thread Omar Tahir
This patch adds a new pattern, *thumb2_csinv, for generating CSINV nstructions.

This pattern relies on a few general changes that will be used throughout
the following patches:
- A new macro, TARGET_COND_ARITH, which is only true on 8.1-M 
Mainline
  and represents the existence of these conditional 
instructions.
- A change to the cond exec hook, 
arm_have_conditional_execution, which
  now returns false if TARGET_COND_ARITH before reload. This 
allows for
  some ifcvt transformations when they would usually be 
disabled. I've
  written a rather verbose comment (with the risk of 
over-explaining)
  as it's a bit of a confusing change.
- One new predicate and one new constraint.
- *thumb2_movcond has been restricted to only match if 
!TARGET_COND_ARITH,
  otherwise it triggers undesirable combines.

2020-07-30: Sudakshina Das 
    Omar Tahir 

* config/arm/arm.h (TARGET_COND_ARITH): New macro.
* config/arm/arm.c (arm_have_conditional_execution): Return 
false if
TARGET_COND_ARITH before reload.
* config/arm/constraints.md: (Z): New constant zero.
* config/arm/predicates.md(arm_comparison_operation): Returns 
true if
comparing CC_REGNUM with constant zero.
* config/arm/thumb2.md (*thumb2_csinv): New.
(*thumb2_movcond): Don't match if TARGET_COND_ARITH.

Regression tested on arm-none-eabi.


gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
    Omar Tahir 

* gcc.target/arm/csinv-1.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dac9a6fb5c4..3a9684cdcd8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29833,12 +29833,20 @@ arm_frame_pointer_required (void)
   return false;
}
-/* Only thumb1 can't support conditional execution, so return true if
-   the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
-  return !TARGET_THUMB1;
+  bool has_cond_exec, enable_ifcvt_trans;
+
+  /* Only THUMB1 cannot support conditional execution. */
+  has_cond_exec = !TARGET_THUMB1;
+
+  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
+ transformations before reload. */
+  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
+
+  /* The ifcvt transformations are only turned on if we return false. */
+  return has_cond_exec && !enable_ifcvt_trans;
}
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..d67c91796e4 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */
 #define TARGET_CRC32   
(arm_arch_crc)
+/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
+   csinv, etc. */
+#define TARGET_COND_ARITH(arm_arch8_1m_main)
+
/* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 011badc9957..048b25ef4a1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -28,6 +28,7 @@
;; The following normal constraints have been used:
;; in ARM/Thumb-2 state: G, I, j, J, K, L, M
;; in Thumb-1 state: I, J, K, L, M, N, O
+;; in all states: Z
;; 'H' was previously used for FPA.
 ;; The following multi-letter normal constraints have been used:
@@ -479,6 +480,11 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 1, true)")))
+(define_constraint "Z"
+  "@internal
+   Integer constant zero."
+  (match_test "op == const0_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 981eec520ba..2144520829c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -485,6 +485,18 @@
   (and (match_operand 0 "expandable_comparison_operator")
(match_test "maybe_get_arm_condition_code (op) != ARM_NV")))
+(define_special_predicate "arm_comparison_operation"
+  (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered,
+ ordered,unlt,unle,unge,ungt")
+{
+  if (XEXP (op, 1) != const0_rtx)
+return false;
+  rtx op0 = XEXP (op, 0);
+  if (!REG_P (op0) || REGNO (op0) != CC_REGNUM)
+return false;
+  return maybe_get_arm_condit

[PATCH 3/5][Arm] New pattern for CSINC instructions

2020-08-04 Thread Omar Tahir
This patch adds a new pattern, *thumb2_csinc, for generating CSINC
instructions. It also modifies an existing pattern, *thumb2_cond_arith, to
output CINC when the operation is an addition and TARGET_COND_ARITH is true.

Regression tested on arm-none-eabi.


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*thumb2_csinc): New.
(*thumb2_cond_arith): Generate CINC where possible.

gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
Omar Tahir 

* gcc.target/arm/csinc-1.c: New test.


diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 0b00aef7ef7..79cf684e5cb 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -743,6 +743,9 @@
 if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx)
   return \"%i5\\t%0, %1, %2, lsr #31\";
+if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH)
+  return \"cinc\\t%0, %1, %d4\";
+
 output_asm_insn (\"cmp\\t%2, %3\", operands);
 if (GET_CODE (operands[5]) == AND)
   {
@@ -952,6 +955,21 @@
(set_attr "predicable" "no")]
)
+(define_insn "*thumb2_csinc"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(plus:SI (match_operand:SI 2 "arm_general_register_operand" "r, r")
+ (const_int 1))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))]
+  "TARGET_COND_ARITH"
+  "@
+   csinc\\t%0, %3, %2, %D1
+   csinc\\t%0, zr, %2, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn "*thumb2_movcond"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts")
   (if_then_else:SI
diff --git a/gcc/testsuite/gcc.target/arm/csinc-1.c 
b/gcc/testsuite/gcc.target/arm/csinc-1.c
new file mode 100644
index 000..b9928493862
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/csinc-1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
+/* { dg-options "-O2 -march=armv8.1-m.main" } */
+
+int
+test_csinc32_condasn1(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*ne" } } */
+  w4 = (w0 == w1) ? (w2 + 1) : w3;
+  return w4;
+}
+
+int
+test_csinc32_condasn2(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csinc\tr\[0-9\]*.*eq" } } */
+  w4 = (w0 == w1) ? w3 : (w2 + 1);
+  return w4;
+}


csel_3.patch
Description: csel_3.patch


[PATCH 4/5][Arm] New pattern for CSNEG instructions

2020-08-04 Thread Omar Tahir
This patch adds a new pattern, *thumb2_csneg, for generating CSNEG
instructions. It also restricts *if_neg_move and *thumb2_negscc to only match
if !TARGET_COND_ARITH which prevents undesirable matches during ifcvt.

Regression tested on arm-none-eabi.


2020-07-30: Sudakshina Das 
Omar Tahir 

* config/arm/thumb2.md (*thumb2_csneg): New.
(*thumb2_negscc): Don't match if TARGET_COND_ARITH.
* config/arm/arm.md (*if_neg_move): Don't match if 
TARGET_COND_ARITH.

gcc/testsuite/ChangeLog:

2020-07-30: Sudakshina Das 
    Omar Tahir 

* gcc.target/arm/csneg.c: New test.


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index a6a31f8f4ef..950e46edfee 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11211,7 +11211,7 @@
 [(match_operand 3 "cc_register" "") (const_int 0)])
(neg:SI (match_operand:SI 2 "s_register_operand" "l,r"))
(match_operand:SI 1 "s_register_operand" "0,0")))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(cond_exec (match_op_dup 4 [(match_dup 3) (const_int 0)])
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 79cf684e5cb..d12467d7644 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -880,7 +880,7 @@
[(match_operand:SI 1 "s_register_operand" "r")
 (match_operand:SI 2 "arm_rhs_operand" "rI")])))
(clobber (reg:CC CC_REGNUM))]
-  "TARGET_THUMB2"
+  "TARGET_THUMB2 && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(const_int 0)]
@@ -970,6 +970,20 @@
(set_attr "predicable" "no")]
)
+(define_insn "*thumb2_csneg"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(neg:SI (match_operand:SI 2 "arm_general_register_operand" "r, r"))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Z")))]
+  "TARGET_COND_ARITH"
+  "@
+   csneg\\t%0, %3, %2, %D1
+   csneg\\t%0, zr, %2, %D1"
+  [(set_attr "type" "csel")
+   (set_attr "predicable" "no")]
+)
+
(define_insn "*thumb2_movcond"
   [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts,Ts")
   (if_then_else:SI
diff --git a/gcc/testsuite/gcc.target/arm/csneg.c 
b/gcc/testsuite/gcc.target/arm/csneg.c
new file mode 100644
index 000..e48606265af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/csneg.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_arch_v8_1m_main_ok } */
+/* { dg-options "-O2 -march=armv8.1-m.main" } */
+
+int
+test_csneg32_condasn1(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */
+  w4 = (w0 == w1) ? -w2 : w3;
+  return w4;
+}
+
+int
+test_csneg32_condasn2(int w0, int w1, int w2, int w3)
+{
+  int w4;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*eq" } } */
+  w4 = (w0 == w1) ? w3 : -w2;
+  return w4;
+}
+
+unsigned long long
+test_csneg_uxtw (unsigned int a, unsigned int b, unsigned int c)
+{
+  unsigned int val;
+
+  /* { dg-final { scan-assembler "csneg\tr\[0-9\]*.*ne" } } */
+  val = a ? b : -c;
+  return val;
+}


csel_4.patch
Description: csel_4.patch


[PATCH 5/5][Arm] New pattern for CSEL, CSET and CSETM instructions

2020-08-04 Thread Omar Tahir
This patch adds a new pattern, *cmovsi_insn, for generating CSEL, CSET and
CSETM instructions. It also generates CSINV and CSINC instructions in specific
cases where one of the operands is constant.

To facilitate this, one new predicate and two new constraints are added, and
*compare_scc is restricted to only match if !TARGET_COND_ARITH to prevent
an unwanted split. Additionally, alternatives 8-10 are re-enabled in
*thumnb2_movsicc_insn and splitting only occurs if !TARGET_COND_ARITH. This
forces the new pattern to be used when possible, but for more complex cases
that can't directly use CSEL it falls back to using IT blocks.

Regression tested on arm-none-eabi. The entire patch set was regression
tested on arm-linux-gnueabi also.

That's all folks!

Thanks,
Omar


2020-07-30: Sudakshina Das 
    Omar Tahir 

* config/arm/thumb2.md (*cmovsi_insn): New.
(*thumb2_movsicc_insn): Don't split if TARGET_COND_ARITH, 
re-enable
alternatives 8-10.
* config/arm/arm.md (*compare_scc): Don't match if 
TARGET_COND_ARITH.


gcc/testsuite/ChangeLog:

2020-07-30: Omar Tahir 

* gcc.target/arm/csel.c: New test.
* gcc.target/arm/cset.c: New test.
* gcc.target/arm/csetm.c: New test.
* gcc.target/arm/csinv-2.c: New test.
* gcc.target/arm/csinc-2.c: New test.


diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 950e46edfee..b8dd6af50a8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9760,7 +9760,7 @@
[(match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 3 "arm_add_operand" "rI,L")]))
   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
+  "TARGET_32BIT && !TARGET_COND_ARITH"
   "#"
   "&& reload_completed"
   [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 2) (match_dup 3)))
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 048b25ef4a1..37d240d139b 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -37,7 +37,7 @@
;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
;; in Thumb-2 state: Ha, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py, Pz, Rd, Rf, Rb, Ra,
;; Rg, Ri
-;; in all states: Pf, Pg
+;; in all states: Pf, Pg, UM, U1
 ;; The following memory constraints have been used:
;; in ARM/Thumb-2 state: Uh, Ut, Uv, Uy, Un, Um, Us, Up, Uf, Ux, Ul
@@ -485,6 +485,16 @@
Integer constant zero."
   (match_test "op == const0_rtx"))
+(define_constraint "UM"
+  "@internal
+   A constraint that matches the immediate constant -1."
+  (match_test "op == constm1_rtx"))
+
+(define_constraint "U1"
+  "@internal
+   A constraint that matches the immediate constant +1."
+  (match_test "op == const1_rtx"))
+
(define_memory_constraint "Ux"
  "@internal
   In ARM/Thumb-2 state a valid address and load into CORE regs or only to
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 2144520829c..5d75341c9ef 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -454,6 +454,13 @@
   && arm_general_register_operand (op, 
GET_MODE (op))")
(match_test "satisfies_constraint_Pg (op)")))
+(define_predicate "arm_reg_or_m1_or_1_or_zero"
+  (and (match_code "reg,subreg,const_int")
+   (ior (match_operand 0 "arm_general_register_operand")
+(match_test "op == constm1_rtx")
+(match_test "op == const1_rtx")
+(match_test "op == const0_rtx"
+
;; True for MULT, to identify which variant of shift_operator is in use.
(define_special_predicate "mult_operator"
   (match_code "mult"))
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index d12467d7644..bc6f2a52004 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -432,6 +432,30 @@
(set_attr "type" "multiple")]
)
+(define_insn "*cmovsi_insn"
+  [(set (match_operand:SI 0 "arm_general_register_operand" 
"=r,r,r,r,r,r,r,r,r")
+(if_then_else:SI
+ (match_operator 1 "arm_comparison_operator"
+  [(match_operand 2 "cc_register" "") (const_int 0)])
+ (match_operand:SI 3 "arm_reg_or_m1_or_1_or_zero" "r, r,UM, r,U1,U1, 
Z,UM, Z")
+ (match_operand:SI 4 "arm_reg_or_m1_or_1_or_zero" "r,UM, r,U1, r, 
Z,U1, Z,UM")))]
+  "TARGET_THUMB2 && TARGET_COND_ARITH
+   && (!((operands[3] == const1_rtx && operands[4] == constm1_rtx)
+   || (operands[3] == 

RE: [PATCH 2/5][Arm] New pattern for CSINV instructions

2020-08-05 Thread Omar Tahir
Hi Kyrill,

> -/* Only thumb1 can't support conditional execution, so return true if
> -   the target is not thumb1.  */
> static bool
> 
> 
> Functions should have comments in GCC. Can you please write something 
> describing the new logic of the function.
> 
> arm_have_conditional_execution (void)
> {
> -  return !TARGET_THUMB1;
> +  bool has_cond_exec, enable_ifcvt_trans;
> +
> +  /* Only THUMB1 cannot support conditional execution. */
> +  has_cond_exec = !TARGET_THUMB1;
> +
> +  /* When TARGET_COND_ARITH is defined we'd like to turn on some ifcvt
> + transformations before reload. */
> +  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
> +
> +  /* The ifcvt transformations are only turned on if we return false. */
> +  return has_cond_exec && !enable_ifcvt_trans;
> 
> I don't think that comment is very useful. Perhaps "Enable ifcvt 
> transformations only if..."
> 
> }

Fixed, let me know if the new comments are a bit clearer now.

> +(define_constraint "Z"
> +  "@internal
> +   Integer constant zero."
> +  (match_test "op == const0_rtx"))
> 
> 
> We're usually wary of adding more constraints unless necessary as it gets 
> complicated to read patterns quickly (especially once we get into 
> multi-letter constraints).
> I think you can reuse the existing "Pz" constraint for your purposes.

Yes Pz works, I'll replace Z with Pz in the other patches as well. In patch 5 I 
introduce UM (-1) and U1 (1), I don't think there's any existing combination of 
constraints that can be used instead.

> 
> Ok with those changes.
> If you'd like to commit it yourself please apply for write access at 
> https://sourceware.org/cgi-bin/pdw/ps_form.cgi listing my email address from 
> MAINTAINERS as the approver.

Excellent, thanks. If the other three patches are okay I'll commit them as well?

Thanks,
Omar

---

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index dac9a6fb5c4..e1bb2db9c8a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29833,12 +29833,23 @@ arm_frame_pointer_required (void)
   return false;
 }
 
-/* Only thumb1 can't support conditional execution, so return true if
-   the target is not thumb1.  */
+/* Implement the TARGET_HAVE_CONDITIONAL_EXECUTION hook.
+   All modes except THUMB1 have conditional execution.
+   If we have conditional arithmetic, return false before reload to
+   enable some ifcvt transformations. */
 static bool
 arm_have_conditional_execution (void)
 {
-  return !TARGET_THUMB1;
+  bool has_cond_exec, enable_ifcvt_trans;
+
+  /* Only THUMB1 cannot support conditional execution. */
+  has_cond_exec = !TARGET_THUMB1;
+
+  /* Enable ifcvt transformations if we have conditional arithmetic, but only
+ before reload. */
+  enable_ifcvt_trans = TARGET_COND_ARITH && !reload_completed;
+
+  return has_cond_exec && !enable_ifcvt_trans;
 }
 
 /* The AAPCS sets the maximum alignment of a vector to 64 bits.  */
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 30e1d6dc994..d67c91796e4 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -177,6 +177,10 @@ emission of floating point pcs attributes.  */
 
 #define TARGET_CRC32   (arm_arch_crc)
 
+/* Thumb-2 but also has some conditional arithmetic instructions like csinc,
+   csinv, etc. */
+#define TARGET_COND_ARITH  (arm_arch8_1m_main)
+
 /* The following two macros concern the ability to execute coprocessor
instructions for VFPv3 or NEON.  TARGET_VFP3/TARGET_VFPD32 are currently
only ever tested when we know we are generating for VFP hardware; we need
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 981eec520ba..2144520829c 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -485,6 +485,18 @@
   (and (match_operand 0 "expandable_comparison_operator")
(match_test "maybe_get_arm_condition_code (op) != ARM_NV")))
 
+(define_special_predicate "arm_comparison_operation"
+  (match_code "eq,ne,le,lt,ge,gt,geu,gtu,leu,ltu,unordered,
+ ordered,unlt,unle,unge,ungt")
+{
+  if (XEXP (op, 1) != const0_rtx)
+return false;
+  rtx op0 = XEXP (op, 0);
+  if (!REG_P (op0) || REGNO (op0) != CC_REGNUM)
+return false;
+  return maybe_get_arm_condition_code (op) != ARM_NV;
+})
+
 (define_special_predicate "lt_ge_comparison_operator"
   (match_code "lt,ge"))
 
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 793f6706868..ecc903970db 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -938,6 +938,20 @@
(set_attr "type" "multiple")]
 )
 
+(define_insn "*thumb2_csinv"
+  [(set (match_operand:SI 0 "arm_general_register_operand" "=r, r")
+  (if_then_else:SI
+(match_operand 1 "arm_comparison_operation" "")
+(not:SI (match_operand:SI 2 "arm_general_register_operand" "r, r"))
+(match_operand:SI 3 "reg_or_zero_operand" "r, Pz")))]
+  "TARGET_COND_ARITH"
+  "@
+   csinv\\t%0, %3, %2, %D1
+   csinv\\t%0, zr, %2, %D1"
+  [(set_attr "type"

RE: [PATCH 3/5][Arm] New pattern for CSINC instructions

2020-08-06 Thread Omar Tahir
> Hi Omar,
> 
> diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
> index 0b00aef7ef7..79cf684e5cb 100644
> --- a/gcc/config/arm/thumb2.md
> +++ b/gcc/config/arm/thumb2.md
> @@ -743,6 +743,9 @@
>  if (GET_CODE (operands[4]) == LT && operands[3] == const0_rtx)
>return \"%i5\\t%0, %1, %2, lsr #31\";
> 
> +if (GET_CODE (operands[5]) == PLUS && TARGET_COND_ARITH)
> +  return \"cinc\\t%0, %1, %d4\";
> +
>  output_asm_insn (\"cmp\\t%2, %3\", operands);
> 
> 
> Hmmm, this looks wrong. The pattern needs to perform the comparison (setting 
> the CC reg) as well as do the conditional increment.
> Emitting a cinc without a cmp won't set the CC flags.
> Also, cinc increments only by 1, whereas the "arm_rhs_operand" predicate 
> accepts a wider variety of immediates, so just checking for GET_CODE 
> (operands[5]) == PLUS isn't enough.
> 
> Thanks,
> Kyrill
> 

My bad, the following line

output_asm_insn (\"cmp\\t%2, %3\", operands);

should be before my change rather than after, that will generate the cmp needed.

As for the predicate accepting other immediates, I don't think that's an issue. 
From what I understand, the pattern represents

r0 = f5 (f4 (r2, r3), r1)

where f5 is a shiftable operator and f4 is a comparison operator. For 
simplicity let's just assume f4 is ==. Then we have

r0 = f5 (r1, r2 == r3)

If f5 is PLUS then we get

r0 = r1 + (r2 == r3)

which is

r0 = (r2 == r3) ? r1 + 1 : r1

i.e. cmp r2, r3 \\ cinc r0, r1, eq. Since all comparisons return either zero 
(comparison failed) or 1 (comparison passed) a cinc should always work as long 
as the shiftable operator is PLUS.
Operand 3 being an "arm_rhs_operand" shouldn't matter since it's just being 
compared to operand 2 and returning a 0 or 1.

Thanks,
Omar


[PATCH] aarch64: Fix missing BTI instruction in trampolines

2020-06-29 Thread Omar Tahir
Hi,

Got a small bugfix here regarding BTIs and trampolines.

If two functions require trampolines, and the first has BTI enabled while the
second doesn't, the generated template will be lacking a BTI instruction.
This patch fixes this by always adding a BTI instruction, which is safe as BTI
instructions are ignored on unsupported architecture versions.

I don't have write access, so could someone commit for me?

Bootstrapped and tested on aarch64 with no regressions.

gcc/ChangeLog:

2020-06-29  Omar Tahir omar.ta...@arm.com

* config/aarch64/aarch64.c (aarch64_asm_trampoline_template): Always
generate a BTI instruction.


gcc/testsuite/ChangeLog:

2020-06-29  Omar Tahir omar.ta...@arm.com

* gcc.target/aarch64/bti-4.c: New test.


trampoline_bti.patch
Description: trampoline_bti.patch


[PATCH] Fix unnecessary register spill that depends on function ordering

2020-06-30 Thread Omar Tahir
Hi,

The variables first_moveable_pseudo and last_moveable_pseudo aren't reset
after compiling a function, which means they leak into the first scheduler
pass of the following function. In some cases, this can cause an extra spill
during register allocation of the second function.

This patch fixes this by setting

first_moveable_pseudo = last_moveable_pseudo

at the beginning of the first scheduler pass.

Because the spill occurs in the middle of the IRA pass it is highly
target-sensitive, so I have provided a test case that works on aarch64. There
doesn't seem to be a target-independent way to trigger this bug, since the
RTL at the IRA stage is already quite target-specific.

Interestingly, the aarch64 test case here doesn't actually perform a complete
register spill - the value is stored on the stack but never retrieved.
Perhaps this points to another, deeper bug?

Bootstrapped and regression tested on aarch64-none-linux-gnu.

I don't have write privileges, so if it's fine could someone push for me?

Thanks,
Omar



gcc/ChangeLog:

2020-06-30: Omar Tahir 

* sched-rgn.c: Include ira-int.h, ira.h, regs.h.
(rest_of_handle_sched): Clear moveable pseudo range.

gcc/testsuite/ChangeLog:

2020-06-30: Omar Tahir 

* gcc.target/aarch64/nospill.c: New test.



diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 7f5dfdb..51329fc 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -65,6 +65,9 @@ along with GCC; see the file COPYING3.  If not see
#include "dbgcnt.h"
#include "pretty-print.h"
#include "print-rtl.h"
+#include "ira.h"
+#include "regs.h"
+#include "ira-int.h"

/* Disable warnings about quoting issues in the pp_xxx calls below
that (intentionally) don't follow GCC diagnostic conventions.  */
@@ -3719,6 +3722,7 @@ static unsigned int
rest_of_handle_sched (void)
{
#ifdef INSN_SCHEDULING
+  first_moveable_pseudo = last_moveable_pseudo;
   if (flag_selective_scheduling
   && ! maybe_skip_selective_scheduling ())
 run_selective_scheduling ();
diff --git a/gcc/testsuite/gcc.target/aarch64/nospill.c 
b/gcc/testsuite/gcc.target/aarch64/nospill.c
new file mode 100644
index 000..60399d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/nospill.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* The pseudo for P is marked as moveable in the IRA pass. */
+float
+func_0 (float a, float b, float c)
+{
+  float p = c / a;
+
+  if (b > 1)
+{
+  b /= p;
+  if (c > 2)
+a /= 3;
+}
+
+  return b / c * a;
+}
+
+/* If first_moveable_pseudo and last_moveable_pseudo are not reset correctly,
+   they will carry over and spill the pseudo for Q. */
+float
+func_1 (float a, float b, float c)
+{
+  float q = a + b;
+
+  c *= a / (b + b);
+  if (a > 0)
+c *= q;
+
+  return a * b * c;
+}
+
+/* We have plenty of spare registers, so check nothing has been spilled. */
+/* { dg-final { scan-assembler-not "str" } } */


moveable_pseudo.patch
Description: moveable_pseudo.patch


RE: [PATCH] Fix unnecessary register spill that depends on function ordering

2020-07-01 Thread Omar Tahir
Hi Richard,

From: Richard Sandiford 
> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) 
> > { #ifdef INSN_SCHEDULING
> > +  first_moveable_pseudo = last_moveable_pseudo;
> >if (flag_selective_scheduling
> >&& ! maybe_skip_selective_scheduling ())
> >  run_selective_scheduling ();
> 
> I think instead we should zero out both variables at the end of IRA.
> There are other places besides the scheduler that call into the IRA code, so 
> tackling the problem there seems more general.

If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then
they'll be zero for the second scheduler pass, which uses them.

In fact, I've just realised that the GCSE and move_loop_invariants passes
also use them (they both call ira_set_pseudo_classes which calls
find_costs_and_classes which uses these variables).

They need to be zeroed or set equal to each other before the first pass that
uses them (move_loop_invariants), but kept alive until after the last pass
that uses them (GCSE). So if there's a function that sets things up right
before the RTL passes start then I think that's a good location candidate.

> > +/* We have plenty of spare registers, so check nothing has been 
> > +spilled. */
> > +/* { dg-final { scan-assembler-not "str" } } */
> 
> The testcase looks good, but it's probably better to make that “\tstr\t”.
> The problem with plain “str” is that assembly output can often include 
> pathnames and version strings, and it's possible that one of those could 
> contain “str”.

Good catch, I'll keep that tip in mind for future!

Thanks,
Omar


RE: [PATCH] Fix unnecessary register spill that depends on function ordering

2020-07-02 Thread Omar Tahir
> Omar Tahir  writes:
> > Hi Richard,
> >
> > From: Richard Sandiford 
> >> > @@ -3719,6 +3722,7 @@ static unsigned int rest_of_handle_sched (void) 
> >> > { #ifdef INSN_SCHEDULING
> >> > +  first_moveable_pseudo = last_moveable_pseudo;
> >> >if (flag_selective_scheduling
> >> >&& ! maybe_skip_selective_scheduling ())
> >> >  run_selective_scheduling ();
> >> 
> >> I think instead we should zero out both variables at the end of IRA.
> >> There are other places besides the scheduler that call into the IRA code, 
> >> so tackling the problem there seems more general.
> >
> > If you zero first_moveable_pseudo and last_moveable_pseudo after IRA then
> > they'll be zero for the second scheduler pass, which uses them.
> 
> Are you sure?  It shouldn't be doing that, since there are no pseudos
> left when the second scheduling pass runs.  RA replaces all pseudos
> with hard registers.
> 
> So if the values in the variables has a noticeable effect on sched2,
> I think that's even more reason to clear them after IRA :-)

That's a good point. A few other passes call functions that make use of
the moveable pseudo variables. But if they're before IRA then they should
be zero, and as you say if they're after IRA then there are no pseudos left!

I've moved the reset to the end of move_unallocated_pseudos. Unfortunately I
can't inline the patch as there's a form feed (^L) that's mangling the text,
not sure how to get around that.

Thanks,
Omar



gcc/ChangeLog:

2020-07-02: Omar Tahir 

* ira.c (move_unallocated_pseudos): Zero first_moveable_pseudo and
last_moveable_pseudo before returning.

gcc/testsuite/ChangeLog:

2020-07-02: Omar Tahir 

* gcc.target/aarch64/nospill.c: New test.


moveable_pseudo.patch
Description: moveable_pseudo.patch