On 2021/8/17 15:12, Richard Biener wrote:
> On Tue, 17 Aug 2021, Xionghu Luo wrote:
>
>> Hi,
>>
>> On 2021/8/16 19:46, Richard Biener wrote:
>>> On Mon, 16 Aug 2021, Xiong Hu Luo wrote:
>>>
>>>> It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
>>>> nested loops. inn_loop is updated to inner loop, so it need be restored
>>>> when exiting from innermost loop. With this patch, the store instruction
>>>> in outer loop could also be moved out of outer loop by store motion.
>>>> Any comments? Thanks.
>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * tree-ssa-loop-im.c (fill_always_executed_in_1): Restore
>>>> inn_loop when exiting from innermost loop.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.dg/tree-ssa/ssa-lim-19.c: New test.
>>>> ---
>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 ++++++++++++++++++++++
>>>> gcc/tree-ssa-loop-im.c | 6 +++++-
>>>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>>> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>> new file mode 100644
>>>> index 00000000000..097a5ee4a4b
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
>>>> @@ -0,0 +1,24 @@
>>>> +/* PR/101293 */
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-lim2-details" } */
>>>> +
>>>> +struct X { int i; int j; int k;};
>>>> +
>>>> +void foo(struct X *x, int n, int l)
>>>> +{
>>>> + for (int j = 0; j < l; j++)
>>>> + {
>>>> + for (int i = 0; i < n; ++i)
>>>> + {
>>>> + int *p = &x->j;
>>>> + int tem = *p;
>>>> + x->j += tem * i;
>>>> + }
>>>> + int *r = &x->k;
>>>> + int tem2 = *r;
>>>> + x->k += tem2 * j;
>>>> + }
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } }
>>>> */
>>>> +
>>>> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
>>>> index b24bc64f2a7..5ca4738b20e 100644
>>>> --- a/gcc/tree-ssa-loop-im.c
>>>> +++ b/gcc/tree-ssa-loop-im.c
>>>> @@ -3211,6 +3211,10 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
>>>> @@ contains_call)
>>>> if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>> last = bb;
>>>> + if (inn_loop != loop
>>>> + && flow_loop_nested_p (bb->loop_father, inn_loop))
>>>> + inn_loop = bb->loop_father;
>>>> +
>>>
>>> The comment says
>>>
>>> /* In a loop that is always entered we may proceed anyway.
>>> But record that we entered it and stop once we leave it.
>>> */
>>> inn_loop = bb->loop_father;
>>>
>>> and your change would defeat that early return, no?
>>
>> The issue is the search method exits too early when iterating the outer
>> loop. For example of a nested loop, loop 1 includes 5,8,3,10,4,9
>> and loop2 includes 3,10. Currently, it breaks when bb is 3 as bb 3
>> doesn't dominate bb 9 of loop 1. But actually, both bb 5 and bb 4 are
>> ALWAYS_EXECUTED for loop 1, so if there are store instructions in bb 4
>> they won't be processed by store motion again.
>>
>>
>> 5<----
>> |\ |
>> 8 \ 9
>> | \ |
>> --->3--->4
>> | |
>> 10---|
>>
>>
>> SET_ALWAYS_EXECUTED_IN is only set to bb 5 on master code now, with this
>> patch, it will continue search when meet bb 3 until bb 4, then last is
>> updated
>> to bb 4, it will break until exit edge is found at bb 4 by
>> "if (!flow_bb_inside_loop_p (loop, e->dest))". Then the followed loop code
>> will
>> set bb 4 as ALWAYS_EXEUCTED and all it's idoms bb 5.
>>
>>
>> while (1)
>> {
>> SET_ALWAYS_EXECUTED_IN (last, loop);
>> if (last == loop->header)
>> break;
>> last = get_immediate_dominator (CDI_DOMINATORS, last);
>> }
>>
>> After further discussion with Kewen, we found that the inn_loop variable is
>> totally useless and could be removed.
>>
>>
>>>
>>>> if (bitmap_bit_p (contains_call, bb->index))
>>>> break;
>>>>
>>>> @@ -3238,7 +3242,7 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
>>>> @@ contains_call)
>>>>
>>>> if (bb->loop_father->header == bb)
>>>> {
>>>> - if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>>>> + if (!dominated_by_p (CDI_DOMINATORS, bb->loop_father->latch,
>>>> bb))
>>>> break;
>>>
>>> That's now a always false condition - a loops latch is always dominated
>>> by its header. The condition as written tries to verify whether the
>>> loop is always entered - mind we visit all blocks, not only those
>>> always executed.
>>
>> Thanks for the catch! I am afraid the piece of code should be removed since
>> it stops
>> search of potential ALWAYS EXECUTED bb after inner loop...
>
> But the code says:
>
> /* In a loop that is always entered we may proceed anyway.
> But record that we entered it and stop once we leave it.
> */
>
> and you do not remove this comment still it doesn't hold anymore
> after your patch. I don't say the current code is optimal - I just
> say it does exactly what is documented.
Removed.
>
>
>>>
>>> In fact for your testcase the x->j ref is _not_ always executed
>>> since the inner loop is conditional on n > 0.
>>
>> Yes. But I want to move x->k (not x->j) out of loop 1 when l > 0 in
>> store-motion.
>
> Sorry, that wasn't clear. Yes, I agree the code fails to see always
> executed blocks after inner loops. But since the code simply walks
> all blocks in a loop instead of greedily following edges it has
> to do that since the inner loop might exit the outer loop as well,
> sth your change wouldn't honor. Consider
This is already handled now, if there is an edge exiting out of outer
loop directly, it will also early break, I've verified it with your case.
FOR_EACH_EDGE (e, ei, bb->succs)
{
/* If there is an exit from this BB. */
if (!flow_bb_inside_loop_p (loop, e->dest))
break;
>
> while (--n)
> {
> do
> {
> if (do_exit)
> goto out;
> }
> while (1);
> p->x += 1;
> }
> out:;
>
> you'll see a CFG where the inner loop exits the outer loop as well.
>
> So I'm saying to improve the code you'll likely have to do more.
> And add a testcase like the following
>
> void __attribute__((noipa))
> foo (int n, int m, int f, int *p, int *q)
> {
> while (--n)
> {
> do
> {
> *q += 1;
> if (f)
> goto out;
> }
> while (--m);
> *p += 1;
> }
> out:;
> }
>
> int main()
> {
> int i = 0;
> foo (10, 10, 1, (void *)0, &i);
> if (i != 1)
> __builtin_abort ();
> return 0;
> }
>
> Richard.
>
Updated the patch:
[PATCH v2] Fix incomplete computation in fill_always_executed_in_1
It seems to me that ALWAYS_EXECUTED_IN is not computed correctly for
nested loops. Current design will exit if an inner loop doesn't
dominate outer loop's latch or exit after exiting from inner loop, which
caused early return from outer loop, then ALWAYS EXECUTED blocks after
inner loops is skipped.
This patch removes the redundant inner loop check, but keep the check of
exiting to out of outer loop from inner loop, then the store instruction
in outer loop could also be moved out of outer loop by store motion.
gcc/ChangeLog:
* tree-ssa-loop-im.c (fill_always_executed_in_1): Remove
inn_loop check.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-lim-19.c: New test.
* gcc.dg/tree-ssa/ssa-lim-20.c: New test.
---
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c | 24 +++++++++++++++++
gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c | 30 ++++++++++++++++++++++
gcc/tree-ssa-loop-im.c | 14 ----------
3 files changed, 54 insertions(+), 14 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
new file mode 100644
index 00000000000..097a5ee4a4b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-19.c
@@ -0,0 +1,24 @@
+/* PR/101293 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+struct X { int i; int j; int k;};
+
+void foo(struct X *x, int n, int l)
+{
+ for (int j = 0; j < l; j++)
+ {
+ for (int i = 0; i < n; ++i)
+ {
+ int *p = &x->j;
+ int tem = *p;
+ x->j += tem * i;
+ }
+ int *r = &x->k;
+ int tem2 = *r;
+ x->k += tem2 * j;
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 2 "lim2" } } */
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
new file mode 100644
index 00000000000..6e180de528e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-20.c
@@ -0,0 +1,30 @@
+/* PR/101293 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+void __attribute__ ((noipa)) foo (int n, int m, int f, int *p, int *q)
+{
+ while (--n)
+ {
+ do
+ {
+ *q += 1;
+ if (f)
+ goto out;
+ }
+ while (--m);
+ *p += 1;
+ }
+out:;
+}
+
+int main()
+{
+ int i = 0;
+ foo (10, 10, 1, (void *) 0, &i);
+ if (i != 1)
+ __builtin_abort ();
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Executing store motion" 1 "lim2" } } */
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index b24bc64f2a7..c44f4390ce2 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3197,7 +3197,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
contains_call)
basic_block bb = NULL, *bbs, last = NULL;
unsigned i;
edge e;
- class loop *inn_loop = loop;
if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
{
@@ -3232,19 +3231,6 @@ fill_always_executed_in_1 (class loop *loop, sbitmap
contains_call)
to disprove this if possible). */
if (bb->flags & BB_IRREDUCIBLE_LOOP)
break;
-
- if (!flow_bb_inside_loop_p (inn_loop, bb))
- break;
-
- if (bb->loop_father->header == bb)
- {
- if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
- break;
-
- /* In a loop that is always entered we may proceed anyway.
- But record that we entered it and stop once we leave it. */
- inn_loop = bb->loop_father;
- }
}
while (1)
--
2.27.0.90.geebb51ba8c