On 21/04/15 13:46, Marcus Shawcroft wrote:
On 20 April 2015 at 16:12, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote:
Thanks,
I could've sworn I had sent this version out a couple hours ago.
My mail client has been playing up.
Here it is with 6 tests. For the tests corresponding to f1/f3 in my
example above I scan that we don't use the 'w1' reg.
I'll give the AArch64 maintainers to comment on the tests for a day or two
before committing.
Using scan-assembler-times is more robust than scan-assembler.
Otherwise, OK by me.
/Marcus
Thanks, I used scan-assembler-times for those tests.
Attached is what I committed with r222268.
Kyrill
Thanks,
Kyrill
2015-04-20 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* expmed.c: (synth_mult): Only assume overlapping
shift with previous steps in alg_sub_t_m2 case.
2015-04-20 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
* gcc.target/aarch64/mult-synth_1.c: New test.
* gcc.target/aarch64/mult-synth_2.c: Likewise.
* gcc.target/aarch64/mult-synth_3.c: Likewise.
* gcc.target/aarch64/mult-synth_4.c: Likewise.
* gcc.target/aarch64/mult-synth_5.c: Likewise.
* gcc.target/aarch64/mult-synth_6.c: Likewise.
jeff
Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog (revision 222266)
+++ gcc/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2015-04-21 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
+
+ * expmed.c: (synth_mult): Only assume overlapping
+ shift with previous steps in alg_sub_t_m2 case.
+
2015-04-21 Richard Biener <rguent...@suse.de>
PR tree-optimization/65788
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_2.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_2.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+int
+foo (int x)
+{
+ return x * 25;
+}
+
+/* { dg-final { scan-assembler-times "mul\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_3.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_3.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+int
+foo (int x)
+{
+ return x * 11;
+}
+
+/* { dg-final { scan-assembler-times "mul\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_4.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_4.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_4.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+long
+foo (int x, int y)
+{
+ return (long)x * 6L;
+}
+
+/* { dg-final { scan-assembler-times "smull\tx\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_5.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_5.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_5.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+int
+foo (int x)
+{
+ return x * 10;
+}
+
+/* { dg-final { scan-assembler-not "\tw1" } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_6.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_6.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_6.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+int
+foo (int x)
+{
+ return x * 20;
+}
+
+/* { dg-final { scan-assembler-not "\tw1" } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/gcc.target/aarch64/mult-synth_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/mult-synth_1.c (revision 0)
+++ gcc/testsuite/gcc.target/aarch64/mult-synth_1.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=cortex-a57 -save-temps" } */
+
+int
+foo (int x)
+{
+ return x * 100;
+}
+
+/* { dg-final { scan-assembler-times "mul\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+" 1 } } */
+/* { dg-final { cleanup-saved-temps } } */
Index: gcc/testsuite/ChangeLog
===================================================================
--- gcc/testsuite/ChangeLog (revision 222266)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,12 @@
+2015-04-21 Kyrylo Tkachov <kyrylo.tkac...@arm.com>
+
+ * gcc.target/aarch64/mult-synth_1.c: New test.
+ * gcc.target/aarch64/mult-synth_2.c: Likewise.
+ * gcc.target/aarch64/mult-synth_3.c: Likewise.
+ * gcc.target/aarch64/mult-synth_4.c: Likewise.
+ * gcc.target/aarch64/mult-synth_5.c: Likewise.
+ * gcc.target/aarch64/mult-synth_6.c: Likewise.
+
2015-04-21 Tom de Vries <t...@codesourcery.com>
PR tree-optimization/65802
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c (revision 222266)
+++ gcc/expmed.c (working copy)
@@ -2664,14 +2664,28 @@
m = exact_log2 (-orig_t + 1);
if (m >= 0 && m < maxm)
{
- op_cost = shiftsub1_cost (speed, mode, m);
+ op_cost = add_cost (speed, mode) + shift_cost (speed, mode, m);
+ /* If the target has a cheap shift-and-subtract insn use
+ that in preference to a shift insn followed by a sub insn.
+ Assume that the shift-and-sub is "atomic" with a latency
+ equal to it's cost, otherwise assume that on superscalar
+ hardware the shift may be executed concurrently with the
+ earlier steps in the algorithm. */
+ if (shiftsub1_cost (speed, mode, m) <= op_cost)
+ {
+ op_cost = shiftsub1_cost (speed, mode, m);
+ op_latency = op_cost;
+ }
+ else
+ op_latency = add_cost (speed, mode);
+
new_limit.cost = best_cost.cost - op_cost;
- new_limit.latency = best_cost.latency - op_cost;
+ new_limit.latency = best_cost.latency - op_latency;
synth_mult (alg_in, (unsigned HOST_WIDE_INT) (-orig_t + 1) >> m,
&new_limit, mode);
alg_in->cost.cost += op_cost;
- alg_in->cost.latency += op_cost;
+ alg_in->cost.latency += op_latency;
if (CHEAPER_MULT_COST (&alg_in->cost, &best_cost))
{
best_cost = alg_in->cost;
@@ -2704,21 +2718,13 @@
if (t % d == 0 && t > d && m < maxm
&& (!cache_hit || cache_alg == alg_add_factor))
{
- /* If the target has a cheap shift-and-add instruction use
- that in preference to a shift insn followed by an add insn.
- Assume that the shift-and-add is "atomic" with a latency
- equal to its cost, otherwise assume that on superscalar
- hardware the shift may be executed concurrently with the
- earlier steps in the algorithm. */
op_cost = add_cost (speed, mode) + shift_cost (speed, mode, m);
- if (shiftadd_cost (speed, mode, m) < op_cost)
- {
- op_cost = shiftadd_cost (speed, mode, m);
- op_latency = op_cost;
- }
- else
- op_latency = add_cost (speed, mode);
+ if (shiftadd_cost (speed, mode, m) <= op_cost)
+ op_cost = shiftadd_cost (speed, mode, m);
+ op_latency = op_cost;
+
+
new_limit.cost = best_cost.cost - op_cost;
new_limit.latency = best_cost.latency - op_latency;
synth_mult (alg_in, t / d, &new_limit, mode);
@@ -2742,21 +2748,12 @@
if (t % d == 0 && t > d && m < maxm
&& (!cache_hit || cache_alg == alg_sub_factor))
{
- /* If the target has a cheap shift-and-subtract insn use
- that in preference to a shift insn followed by a sub insn.
- Assume that the shift-and-sub is "atomic" with a latency
- equal to it's cost, otherwise assume that on superscalar
- hardware the shift may be executed concurrently with the
- earlier steps in the algorithm. */
op_cost = add_cost (speed, mode) + shift_cost (speed, mode, m);
- if (shiftsub0_cost (speed, mode, m) < op_cost)
- {
- op_cost = shiftsub0_cost (speed, mode, m);
- op_latency = op_cost;
- }
- else
- op_latency = add_cost (speed, mode);
+ if (shiftsub0_cost (speed, mode, m) <= op_cost)
+ op_cost = shiftsub0_cost (speed, mode, m);
+ op_latency = op_cost;
+
new_limit.cost = best_cost.cost - op_cost;
new_limit.latency = best_cost.latency - op_latency;
synth_mult (alg_in, t / d, &new_limit, mode);