On 2021/8/17 13:17, Xionghu Luo via Gcc-patches wrote:
Hi,
On 2021/8/16 19:46, Richard Biener wrote:
On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. inn_loop is updated to inner loop, so it need be restored
when exiting from innermost loop. With this patch, the store instruction
in outer loop could also be moved out of outer loop by store motion.
Any comments? Thanks.
gcc/ChangeLog:
* tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
inn_loop when exiting from innermost loop.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
---
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
gcc/tree-ssa-loop-im.c | 6 +++++-
2 files changed, 29 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 00000000000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+ for (int j = 0; j < l; j++)
+ {
+ for (int i = 0; i < n; ++i)
+ {
+ int *p = &x->j;
+ int tem = *p;
+ x->j += tem * i;
+ }
+ int *r = &x->k;
+ int tem2 = *r;
+ x->k += tem2 * j;
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2
"lim2" } } */
+
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..5ca4738b20e 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop,
sbitmap contains_call)
if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
last = bb;
+ if (inn_loop != loop
+ && flow_loop_nested_p (bb->loop_father, inn_loop))
+ inn_loop = bb->loop_father;
+
The comment says
/* In a loop that is always entered we may proceed anyway.
But record that we entered it and stop once we leave
it.
*/
inn_loop = bb->loop_father;
and your change would defeat that early return, no?
The issue is the search method exits too early when iterating the outer
loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9
and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3
doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are
ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
they won't be processed by store motion again.
5<----
|\ |
8 \ 9
| \ |
--->3--->4
| | 10---|
Correct the graph display:
5<----
|\ |
8 \ 9
| \ |
--->3--->4
| |
---10
SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
patch, it will continue search when meet bb 3 until bb 4, then last is
updated
to bb 4, it will break until exit edge is found at bb 4 by
"if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop
code will
set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
while (1)
{
SET_ALWAYS_EXECUTED_IN (last, loop);
if (last == loop->header)
break;
last = get_immediate_dominator (CDI_DOMINATORS, last);
}
After further discussion with Kewen, we found that the inn_loop variable is
totally useless and could be removed.
if (bitmap_bit_p (contains_call, bb->index))
break;
@@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop,
sbitmap contains_call)
if (bb->loop_father->header == bb)
{
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
+ if (!dominated_by_p (CDI_DOMINATORS,
bb->loop_father->latch, bb))
break;
That's now a always false condition - a loops latch is always dominated
by its header. The condition as written tries to verify whether the
loop is always entered - mind we visit all blocks, not only those
always executed.
Thanks for the catch! I am afraid the piece of code should be removed
since it stops
search of potential ALWAYS EXECUTED bb after inner loop...
In fact for your testcase the x->j ref is _not_ always executed
since the inner loop is conditional on n > 0.
Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in
store-motion.
Attached the diff file without and with my patch to show the extra
optimization.
x->j is already moved out of loop 2 on master code.
If change n and l to constant numbers like 100, master code could also
do 2 store
motions as expected. The edge from bb 5 to bb 4 doesn't exist now, so bb
4, bb 3
and bb 5 are ALWAYS EXECUTED for loop 1.
struct X { int i; int j; int k;};
void foo(struct X *x, int n, int l)
{
for (int j = 0; j < l; j++) // loop 1
{
for (int i = 0; i < n; ++i) // loop 2
{
int *p = &x->j;
int tem = *p;
x->j += tem * i;
}
int *r = &x->k;
int tem2 = *r;
x->k += tem2 * j;
}
}
Richard.
--
Thanks,
Xionghu