On Thu, 21 Oct 2021, Xionghu Luo wrote:
>
>
> On 2021/10/15 13:51, Xionghu Luo via Gcc-patches wrote:
> >
> >
> > On 2021/9/23 20:17, Richard Biener wrote:
> >> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/8/11 17:16, Richard Biener wrote:
> >>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
> >>>>
> >>>>>
> >>>>>
> >>>>> On 2021/8/10 22:47, Richard Biener wrote:
> >>>>>> On Mon, 9 Aug 2021, Xionghu Luo wrote:
> >>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> On 2021/8/6 19:46, Richard Biener wrote:
> >>>>>>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
> >>>>>>>>
> >>>>>>>>> loop split condition is moved between loop1 and loop2, the split
> >>>>>>>>> bb's
> >>>>>>>>> count and probability should also be duplicated instead of (100% vs
> >>>>>>>>> INV),
> >>>>>>>>> secondly, the original loop1 and loop2 count need be propotional
> >>>>>>>>> from
> >>>>>>>>> the
> >>>>>>>>> original loop.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> diff base/loop-cond-split-1.c.151t.lsplit
> >>>>>>>>> patched/loop-cond-split-1.c.151t.lsplit:
> >>>>>>>>> ...
> >>>>>>>>> int prephitmp_16;
> >>>>>>>>> int prephitmp_25;
> >>>>>>>>>
> >>>>>>>>> <bb 2> [local count: 118111600]:
> >>>>>>>>> if (n_7(D) > 0)
> >>>>>>>>> goto <bb 4>; [89.00%]
> >>>>>>>>> else
> >>>>>>>>> goto <bb 3>; [11.00%]
> >>>>>>>>>
> >>>>>>>>> <bb 3> [local count: 118111600]:
> >>>>>>>>> return;
> >>>>>>>>>
> >>>>>>>>> <bb 4> [local count: 105119324]:
> >>>>>>>>> pretmp_3 = ga;
> >>>>>>>>>
> >>>>>>>>> - <bb 5> [local count: 955630225]:
> >>>>>>>>> + <bb 5> [local count: 315357973]:
> >>>>>>>>> # i_13 = PHI <i_10(20), 0(4)>
> >>>>>>>>> # prephitmp_12 = PHI <prephitmp_5(20), pretmp_3(4)>
> >>>>>>>>> if (prephitmp_12 != 0)
> >>>>>>>>> goto <bb 6>; [33.00%]
> >>>>>>>>> else
> >>>>>>>>> goto <bb 7>; [67.00%]
> >>>>>>>>>
> >>>>>>>>> - <bb 6> [local count: 315357972]:
> >>>>>>>>> + <bb 6> [local count: 104068130]:
> >>>>>>>>> _2 = do_something ();
> >>>>>>>>> ga = _2;
> >>>>>>>>>
> >>>>>>>>> - <bb 7> [local count: 955630225]:
> >>>>>>>>> + <bb 7> [local count: 315357973]:
> >>>>>>>>> # prephitmp_5 = PHI <prephitmp_12(5), _2(6)>
> >>>>>>>>> i_10 = inc (i_13);
> >>>>>>>>> if (n_7(D) > i_10)
> >>>>>>>>> goto <bb 21>; [89.00%]
> >>>>>>>>> else
> >>>>>>>>> goto <bb 11>; [11.00%]
> >>>>>>>>>
> >>>>>>>>> <bb 11> [local count: 105119324]:
> >>>>>>>>> goto <bb 3>; [100.00%]
> >>>>>>>>>
> >>>>>>>>> - <bb 21> [local count: 850510901]:
> >>>>>>>>> + <bb 21> [local count: 280668596]:
> >>>>>>>>> if (prephitmp_12 != 0)
> >>>>>>>>> - goto <bb 20>; [100.00%]
> >>>>>>>>> + goto <bb 20>; [33.00%]
> >>>>>>>>> else
> >>>>>>>>> - goto <bb 19>; [INV]
> >>>>>>>>> + goto <bb 19>; [67.00%]
> >>>>>>>>>
> >>>>>>>>> - <bb 20> [local count: 850510901]:
> >>>>>>>>> + <bb 20> [local count: 280668596]:
> >>>>>>>>> goto <bb 5>; [100.00%]
> >>>>>>>>>
> >>>>>>>>> - <bb 19> [count: 0]:
> >>>>>>>>> + <bb 19> [local count: 70429947]:
> >>>>>>>>> # i_23 = PHI <i_10(21)>
> >>>>>>>>> # prephitmp_25 = PHI <prephitmp_5(21)>
> >>>>>>>>>
> >>>>>>>>> - <bb 12> [local count: 955630225]:
> >>>>>>>>> + <bb 12> [local count: 640272252]:
> >>>>>>>>> # i_15 = PHI <i_23(19), i_22(16)>
> >>>>>>>>> # prephitmp_16 = PHI <prephitmp_25(19), prephitmp_16(16)>
> >>>>>>>>> i_22 = inc (i_15);
> >>>>>>>>> if (n_7(D) > i_22)
> >>>>>>>>> goto <bb 16>; [89.00%]
> >>>>>>>>> else
> >>>>>>>>> goto <bb 11>; [11.00%]
> >>>>>>>>>
> >>>>>>>>> - <bb 16> [local count: 850510901]:
> >>>>>>>>> + <bb 16> [local count: 569842305]:
> >>>>>>>>> goto <bb 12>; [100.00%]
> >>>>>>>>>
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> gcc/ChangeLog:
> >>>>>>>>>
> >>>>>>>>> * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
> >>>>>>>>> (do_split_loop_on_cond): Likewise.
> >>>>>>>>> ---
> >>>>>>>>> gcc/tree-ssa-loop-split.c | 16 ++++++++--------
> >>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >>>>>>>>> index 3a09bbc39e5..8e5a7ded0f7 100644
> >>>>>>>>> --- a/gcc/tree-ssa-loop-split.c
> >>>>>>>>> +++ b/gcc/tree-ssa-loop-split.c
> >>>>>>>>> @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
> >>>>>>>>> basic_block cond_bb;
> >>>>>>>
> >>>>>>> if (!initial_true)
> >>>>>>> - cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>> + cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
> >>>>>>> +
> >>>>>>> + edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
> >>>>>>> + ? EDGE_SUCC (bbs[i], 0)
> >>>>>>> + : EDGE_SUCC (bbs[i], 1);
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> >>>>>>>>> - profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> - profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> - profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> - profile_probability::always
> >>>>>>>>> (),
> >>>>>>>>> + true_edge->probability,
> >>>>>>>>> +
> >>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>> + true_edge->probability,
> >>>>>>>>> +
> >>>>>>>>> true_edge->probability.invert (),
> >>>>>>>>> true);
> >>>>>>>>
> >>>>>>>> there is no 'true_edge' variable at this point.
> >>>>>>>
> >>>>>>> Sorry, missed the above hunk when split the patch.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> gcc_assert (loop2);
> >>>>>>>>> @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop
> >>>>>>>>> *loop1,
> >>>>>>>>> edge invar_branch)
> >>>>>>>>> initialize_original_copy_tables ();
> >>>>>>>>>
> >>>>>>>>> struct loop *loop2 = loop_version (loop1, boolean_true_node,
> >>>>>>>>> NULL,
> >>>>>>>>> - profile_probability::always (),
> >>>>>>>>> - profile_probability::never (),
> >>>>>>>>> - profile_probability::always (),
> >>>>>>>>> - profile_probability::always (),
> >>>>>>>>> + invar_branch->probability.invert
> >>>>>>>>> (),
> >>>>>>>>> + invar_branch->probability,
> >>>>>>>>> + invar_branch->probability.invert
> >>>>>>>>> (),
> >>>>>>>>> + invar_branch->probability,
> >>>>>>>>> true);
> >>>>>>>>> if (!loop2)
> >>>>>>>>> {
> >>>>>>>>
> >>>>>>>> The patch introduction seems to talk about do_split_loop_on_cond
> >>>>>>>> only.
> >>>>>>>
> >>>>>>> split_loop faces similar issue though it sets the two branches to
> >>>>>>> 100% vs
> >>>>>>> 100%
> >>>>>>> and no scaling which seems also incorrect.
> >>>>>>>
> >>>>>>>> Since loop versioning inserts a condition with the passed
> >>>>>>>> probabilities
> >>>>>>>> but in this case a 'boolean_true_node' condition the then and else
> >>>>>>>> probabilities passed look correct. It's just the scaling arguments
> >>>>>>>> that look wrong? This loop_version call should get a comment as to
> >>>>>>>> why we are passing probabilities the way we do.
> >>>>>>>
> >>>>>>> This optimization is transforming from:
> >>>>>>>
> >>>>>>> for (i = 0; i < n; i = inc (i))
> >>>>>>> {
> >>>>>>> if (ga)
> >>>>>>> ga = do_something ();
> >>>>>>> }
> >>>>>>>
> >>>>>>> to:
> >>>>>>>
> >>>>>>> for (i = 0; i < x; i = inc (i))
> >>>>>>> {
> >>>>>>> if (true)
> >>>>>>> ga = do_something ();
> >>>>>>> if (!ga)
> >>>>>>> break;
> >>>>>>> }
> >>>>>>> for (; i < n; i = inc (i))
> >>>>>>> {
> >>>>>>> if (false)
> >>>>>>> ga = do_something ();
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> `boolean_true_node` is passed in to make the first loop's condition
> >>>>>>> statement to be 'true', after returning from loop_version, there is a
> >>>>>>> piece of code forcing the condition in second loop to be false,
> >>>>>>> and the original condition is moved from *in loop* to *exit edge*
> >>>>>>> between loop1 and loop2.
> >>>>>>
> >>>>>> Yes, one complication is that we use loop_version but do not retain
> >>>>>> the CFG it creates. Something like the vectorizers
> >>>>>> slpeel_tree_duplicate_loop_to_edge_cfg would be a better "fit"
> >>>>>> but then that code doesn't do any scaling yet. But then
> >>>>>> loop_version uses cfg_hook_duplicate_loop_to_header_edge and I suppose
> >>>>>> we could write a variant that simply doesn't mangle the CFG
> >>>>>> with a new condition switching between both loops but keeps them
> >>>>>> executed after each other ...
> >>>>>>
> >>>>>> As said, this adds to the confusion and some awkwardness.
> >>>>>
> >>>>> Then loop_version in loop split requires two types of variant, one
> >>>>> is to insert condition to loop preheader for 'split_loops' usage,
> >>>>> another is to insert condition to loop exit for 'do_split_loop_on_cond'
> >>>>> usage, it needs one extra function to encapsulate these cfg alterations
> >>>>> from outside to inside.
> >>>>>
> >>>>> unswitching only execute one loop as it only moves the invariant
> >>>>> condition
> >>>>> to first loop's pre-header. While 'split_loops' and
> >>>>> 'do_split_loop_on_cond'
> >>>>> may execute both loops no matter the condition is moved to the first
> >>>>> loop's
> >>>>> preheader or exit.
> >>>>>
> >>>>> The condition stmt in loop unswitching is invariant, but it is variant
> >>>>> in loop splitting, that's why loop unswitching execute only one loop
> >>>>> but loop splitting executes both loops.
> >>>>>
> >>>>> Seems we need two condition arguments for loop_version, one for
> >>>>> connecting
> >>>>> loop1 preheader to loop2 preheader, another one for connecting loop1's
> >>>>> exit
> >>>>> to loop2's header? Then it will be more generic for both unswitching
> >>>>> pass
> >>>>> and splitting pass. Looks a bit complicated and conflicted with
> >>>>> loop_version's
> >>>>> comments:
> >>>>>
> >>>>> /* Main entry point for Loop Versioning transformation.
> >>>>>
> >>>>> This transformation given a condition and a loop, creates
> >>>>> -if (condition) { loop_copy1 } else { loop_copy2 }, ... */
> >>>>>
> >>>>>
> >>>>> And this only works for loop split usage, those many other places
> >>>>> doesn't use loop_version like this?
> >>>>
> >>>> Yes, as said if you don't want the above CFG then you probably
> >>>> shouldn't use loop_version but instead use its building blocks
> >>>> (and some refactoring of loop_version can make that easier).
> >>>>
> >>>> I think splitting wants
> >>>>
> >>>> loop_copy1
> >>>> if (condition)
> >>>> loop_copy2
> >>>>
> >>>> IMHO it would be good to split 'loopify' into the actual loopification
> >>>> and the scaling. Basically make the part of loop_version that
> >>>> copies the loop on the header edge and creates a loop structure for
> >>>> it separate.
> >>>>
> >>>> loop distribution uses slpeel_tree_duplicate_loop_to_edge_cfg
> >>>> (copy_loop_before).
> >>>>
> >>>
> >>> Unfortunately slpeel_tree_duplicate_loop_to_edge_cfg only supports
> >>> copying loops with single exit, it would cause many ICEs in it even
> >>> building GCC stage1 (line 1065, line 1184 due to exit or new_exit
> >>> is NULL returning from single_exit (loop).). Seems loop_version is
> >>> more flexible for loop split.
> >>
> >> Hmm, sure - loop_version does not need to do anything special with
> >> exits since control flow either enters the original or the loop copy.
> >>
> >> But slpeel_tree_duplicate_loop_to_edge_cfg intends to create
> >> control flow that enters _both_ loops, so it needs to have
> >> an edge from the first loop exit to the second loop entry.
> >>
> >> One could extend this to a region copy, copying eventual CFG merges
> >> of exits and specifying which exit of a SEME region is supposed
> >> to be connected to the original region entry.
> >>
> >> After all that's what loop splitting needs in the end - though not
> >> sure what exactly it does with more than one exit.
> >
> > In tree-ssa-loop-split.c, split_loop only accepts single exit loop,
> > the recently added split_loop_on_cond could process multiple exits loop.
> >
> > For example, do some changes to the loop-cond-split-1.c,
> >
> > int ga;
> > extern int a;
> > extern int b;
> > extern int c;
> >
> > void test1 (int n) {
> > int i;
> > for (i = 0; i < n; i = inc (i)). {
> > if (a+3 > 0)
> > break;
> >
> > if (ga)
> > ga = do_something ();
> >
> > for (int j = 0; j < n; j++)
> > {
> > b+=5;
> > if (b > c) break;
> > }
> > }
> > }
> >
> > the "if (ga)" will be a new exit edge from loop_copy1 to loop_copy2.
> > I am not sure whether it is valuable to do semi-invariant loop split to such
> > cases with multiple exits, but obviously the split_loop_on_cond is a special
> > case from split_loop both duplicate loop to
> > if (condition1) {loop_copy1} if (condition2) {loop_copy2}
> > The only difference is condition1 is true for split_loop_on_cond.
> >
> >>
> >> So there's another set of "loop" copying, gimple_duplicate_sese_region,
> >> which doesn't actually require a single exit but a single "important"
> >> exit. That might match how you treat multiple exits.
> >
> > gimple_duplicate_sese_region doesn't handle subloops and latches. Finally,
> > I found that loop_version is still much better
> > than slpeel_tree_duplicate_loop_to_edge_cfg and gimple_duplicate_sese_region
> > since it could handle all cases like multiple exits/subloops, etc. I did
> > some
> > refactor to the code to introduce loop_version2 to create duplicated loops
> > with two input conditions as attached patch, is this reasonable enough?
> >
> > I also tried to copy the code in loop_version out of it to don't call
> > loop_version
> > in loop_version2, but it seems useless with many duplicate code and NOT get
> > rid
> > of creating "if (condition1) {loop_copy1}" at first?
> >
> >
>
> The previous attached patch
> 0001-Add-loop_version2-to-support-loop-transformation-wit.patch
> had issues when connecting two loops, reason is split_loop connect_loops at
> *exit* edge,
> do_split_loop_on_cond connect_loops at latch_edge. So they have different
> CFGs even
> both with two conditions :(. It seems not so that useful to also define
> another new function
> 'connect_loops_at_latch' given the limited usage in loop split only? And the
> loop_version2
> also looks not so general again ...
All I wanted to say is that none of the current high-level APIs we have
are a 1:1 match for loop splitting and that they in fact might do
stuff that's unnecessary or counter-productive. If inventing a new API
doesn't sound appealing then maybe refactoring an existing
(maybe loop_version) to expose re-usable pieces would make more sense.
For example loop_version currently does lv_adjust_loop_entry_edge
before it loopifys the copy inserted on the header. I wonder if
the condition generation can be done later and thus we can
have three pieces - 1) duplicating the loop on the entry edge,
2) adjusting the CFG to insert a condition branching to either loop,
3) from loopify extract the scale_loop_frequencies bits (in fact
loopify is only used from within cfgloopmanip.c)
That said, it shouldn't be a requirement to do all this to fix the
profile for loop splitting it's just that the current situation
does not help understanding of how the adjustment works.
Richard.