On Fri, Jul 28, 2023 at 9:58 AM Jan Hubicka via Gcc-patches
<[email protected]> wrote:
>
> Hi,
> this patch fixes profile update in the first case of loop splitting.
> The pass still gives up on very basic testcases:
>
> __attribute__ ((noinline,noipa))
> void test1 (int n)
> {
> if (n <= 0 || n > 100000)
> return;
> for (int i = 0; i <= n; i++)
> {
> if (i < n)
> do_something ();
> if (a[i])
> do_something2();
> }
> }
> Here I needed to do the conditoinal that enforces sane value range of n.
> The reason is that it gives up on:
> !number_of_iterations_exit (loop1, exit1, &niter, false, true)
> and without the conditonal we get assumption that n>=0 and not INT_MAX.
> I think from overflow we shold derive that INT_MAX test is not needed and
> since
> the loop does nothing for n<0 it is also just an paranoia.
I only get n != 2147483647 (loop header copying does the n >= 0). Indeed
this test looks odd. It's because we turn i <= n into i < n + 1 and analyze
that (our canonical test is LT_EXPR), for this to work n may not be INT_MAX.
> I am not sure how to fix this though :(. In general the pass does not really
> need to compute iteration count. It only needs to know what direction the IVs
> go so it can detect tests that fires in first part of iteration space.
>
> Rich, any idea what the correct test should be?
In principle it could just look at the scalar evolution for the IV in
the exit test.
Aka use simple_iv () and check ->no_overflow?
> In testcase:
> for (int i = 0; i < 200; i++)
> if (i < 150)
> do_something ();
> else
> do_something2 ();
> the old code did wrong update of the exit condition probabilities.
> We know that first loop iterates 150 times and the second loop 50 times
> and we get it by simply scaling loop body by the probability of inner test.
>
> With the patch we now get:
>
> <bb 2> [count: 1000]:
>
> <bb 3> [count: 150000]: <- loop 1 correctly iterates 149 times
> # i_10 = PHI <i_7(8), 0(2)>
> do_something ();
> i_7 = i_10 + 1;
> if (i_7 <= 149)
> goto <bb 8>; [99.33%]
> else
> goto <bb 17>; [0.67%]
>
> <bb 8> [count: 149000]:
> goto <bb 3>; [100.00%]
>
> <bb 16> [count: 1000]:
> # i_15 = PHI <i_18(17)>
>
> <bb 9> [count: 49975]: <- loop 2 should iterate 50 times but
> we are slightly wrong
> # i_3 = PHI <i_15(16), i_14(13)>
> do_something2 ();
> i_14 = i_3 + 1;
> if (i_14 != 200)
> goto <bb 13>; [98.00%]
> else
> goto <bb 7>; [2.00%]
>
> <bb 13> [count: 48975]:
> goto <bb 9>; [100.00%]
>
> <bb 17> [count: 1000]: <- this test is always true becuase it is
> reached form bb 3
> # i_18 = PHI <i_7(3)>
> if (i_18 != 200)
> goto <bb 16>; [99.95%]
> else
> goto <bb 7>; [0.05%]
>
> <bb 7> [count: 1000]:
> return;
>
> The reason why we are slightly wrong is the condtion in bb17 that
> is always true but the pass does not konw it.
>
> Rich any idea how to do that? I think connect_loops should work out
> the cas where the loop exit conditon is never satisfied at the time
> the splitted condition fails for first time.
>
> Also we do not update loop iteration expectancies. If we were able to
> work out if one of the loop has constant iteration count, we could do it
> perfectly.
>
> Before patch on hmmer we get a lot of mismatches:
> Profile report here claims:
> dump id |static mismat|dynamic mismatch |
> |in count |in count |time |
> lsplit | 5 +5| 8151850567 +8151850567| 531506481006 +57.9%|
> ldist | 9 +4| 15345493501 +7193642934| 606848841056 +14.2%|
> ifcvt | 10 +1| 15487514871 +142021370| 689469797790 +13.6%|
> vect | 35 +25| 17558425961 +2070911090| 517375405715 -25.0%|
> cunroll | 42 +7| 16898736178 -659689783| 452445796198 -4.9%|
> loopdone| 33 -9| 2678017188 -14220718990| 330969127663 |
> tracer | 34 +1| 2678018710 +1522| 330613415364 +0.0%|
> fre | 33 -1| 2676980249 -1038461| 330465677073 -0.0%|
> expand | 28 -5| 2497468467 -179511782|--------------------------|
>
> With patch
>
> lsplit | 0 | 0 | 328723360744 -2.3%|
> ldist | 0 | 0 | 396193562452 +20.6%|
> ifcvt | 1 +1| 71010686 +71010686| 478743508522 +20.8%|
> vect | 14 +13| 697518955 +626508269| 299398068323 -37.5%|
> cunroll | 13 -1| 489349408 -208169547| 257777839725 -10.5%|
> loopdone| 11 -2| 402558559 -86790849| 201010712702 |
> tracer | 13 +2| 402977200 +418641| 200651036623 +0.0%|
> fre | 13 | 402622146 -355054| 200344398654 -0.2%|
> expand | 11 -2| 333608636 -69013510|--------------------------|
>
> So no mismatches for lsplit and ldist and also lsplit thinks it improves
> speed by 2.3% rather than regressig it by 57%.
>
> Update is still not perfect since we do not work out that the second loop
> never iterates. Also ldist is still wrong siince time should not go up.
>
> Ifcft wrecks profile by desing since it insert conditonals with both arms 100%
> that will be eliminated later after vect. It is not clear to me what happens
> in vect though.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> gcc/ChangeLog:
>
> PR middle-end/106293
> * tree-ssa-loop-split.cc (connect_loops): Change probability
> of the test preconditioning second loop to very_likely.
> (fix_loop_bb_probability): Handle correctly case where
> on of the arms of the conditional is empty.
> (split_loop): Fold the test guarding first condition to
> see if it is constant true; Set correct entry block
> probabilities of the split loops; determine correct loop
> eixt probabilities.
>
> gcc/testsuite/ChangeLog:
>
> PR middle-end/106293
> * gcc.dg/tree-prof/loop-split-1.c: New test.
> * gcc.dg/tree-prof/loop-split-2.c: New test.
> * gcc.dg/tree-prof/loop-split-3.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> new file mode 100644
> index 00000000000..46f7ae66fa3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-1.c
> @@ -0,0 +1,33 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks
> -fdump-tree-optimized-details-blocks" } */
> +
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n, int ga)
> +{
> + for (int i = 0; i < 200; i++)
> + if (i < 150)
> + do_something ();
> + else
> + do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> + for (int i = 0 ; i < 1000; i++)
> + test1(10, 10);
> + return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit"
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> new file mode 100644
> index 00000000000..de2c9f58301
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-2.c
> @@ -0,0 +1,36 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks
> -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> + if (n <= 0 || n > 100000)
> + return;
> + for (int i = 0; i <= n; i++)
> + if (i < n)
> + do_something ();
> + else
> + do_something2 ();
> +}
> +int
> +main(int, char **)
> +{
> + for (int i = 0 ; i < 1000; i++)
> + test1(M);
> + return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit"
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> new file mode 100644
> index 00000000000..a88bc1f8663
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-prof/loop-split-3.c
> @@ -0,0 +1,41 @@
> +/* { dg-options "-O3 -fdump-tree-lsplit-details-blocks
> -fdump-tree-optimized-details-blocks" } */
> +
> +int M = 100;
> +int a[1000];
> +
> +void
> +__attribute__ ((noinline,noipa))
> +do_something()
> +{
> +}
> +void
> +__attribute__ ((noinline,noipa))
> +do_something2()
> +{
> +}
> +
> +__attribute__ ((noinline,noipa))
> +void test1 (int n)
> +{
> + if (n <= 0 || n > 100000)
> + return;
> + for (int i = 0; i <= n; i++)
> + {
> + if (i < n)
> + do_something ();
> + if (a[i])
> + do_something2();
> + }
> +}
> +int
> +main(int, char **)
> +{
> + for (int i = 0 ; i < 1000; i+=3)
> + a[i]=1;
> + for (int i = 0 ; i < 1000; i++)
> + test1(M);
> + return 0;
> +}
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Loop split" 1 "lsplit"
> } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "lsplit" } } */
> +/* { dg-final-use-not-autofdo { scan-tree-dump-times "Invalid sum" 0
> "optimized" } } */
> diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
> index f441f3fe1b5..70cd0aaefa7 100644
> --- a/gcc/tree-ssa-loop-split.cc
> +++ b/gcc/tree-ssa-loop-split.cc
> @@ -355,7 +356,7 @@ connect_loops (class loop *loop1, class loop *loop2)
> new_e->flags |= EDGE_TRUE_VALUE;
> }
>
> - new_e->probability = profile_probability::likely ();
> + new_e->probability = profile_probability::very_likely ();
> skip_e->probability = new_e->probability.invert ();
>
> return new_e;
> @@ -496,6 +497,8 @@ fix_loop_bb_probability (class loop *loop1, class loop
> *loop2, edge true_edge,
> unsigned j;
> for (j = 0; j < loop1->num_nodes; j++)
> if (bbs1[j] == loop1->latch
> + /* Watch for case where the true conditional is empty. */
> + || !single_pred_p (true_edge->dest)
> || !dominated_by_p (CDI_DOMINATORS, bbs1[j], true_edge->dest))
> bbs1[j]->count
> = bbs1[j]->count.apply_probability (true_edge->probability);
> @@ -507,6 +510,8 @@ fix_loop_bb_probability (class loop *loop1, class loop
> *loop2, edge true_edge,
> bbs2 = get_loop_body (loop2);
> for (j = 0; j < loop2->num_nodes; j++)
> if (bbs2[j] == loop2->latch
> + /* Watch for case where the flase conditional is empty. */
> + || !single_pred_p (bbi_copy)
> || !dominated_by_p (CDI_DOMINATORS, bbs2[j], bbi_copy))
> bbs2[j]->count
> = bbs2[j]->count.apply_probability (true_edge->probability.invert ());
> @@ -564,6 +569,7 @@ split_loop (class loop *loop1)
> for (i = 0; i < loop1->num_nodes; i++)
> if ((guard_iv = split_at_bb_p (loop1, bbs[i], &border, &iv)))
> {
> + profile_count entry_count = loop_preheader_edge (loop1)->count ();
> /* Handling opposite steps is not implemented yet. Neither
> is handling different step sizes. */
> if ((tree_int_cst_sign_bit (iv.step)
> @@ -610,7 +616,8 @@ split_loop (class loop *loop1)
> if (stmts2)
> gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop1),
> stmts2);
> - tree cond = build2 (guard_code, boolean_type_node, guard_init,
> border);
> + tree cond = fold_build2 (guard_code, boolean_type_node,
> + guard_init, border);
> if (!initial_true)
> cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>
> @@ -622,13 +629,71 @@ split_loop (class loop *loop1)
> initialize_original_copy_tables ();
> basic_block cond_bb;
>
> + profile_probability loop1_prob
> + = integer_onep (cond) ? profile_probability::always ()
> + : true_edge->probability;
> + /* TODO: It is commonly a case that we know that both loops will be
> + entered. very_likely below is the probability that second loop
> will
> + be entered given by connect_loops. We should work out the common
> + case it is always true. */
> class loop *loop2 = loop_version (loop1, cond, &cond_bb,
> - true_edge->probability,
> - true_edge->probability.invert (),
> + loop1_prob,
> + /* Pass always as we will later
> + redirect first loop to second
> + loop. */
> profile_probability::always (),
> profile_probability::always (),
> + profile_probability::very_likely (),
> true);
> gcc_assert (loop2);
> + /* Correct probability of edge cond_bb->preheader_of_loop2. */
> + single_pred_edge
> + (loop_preheader_edge (loop2)->src)->probability
> + = loop1_prob.invert ();
> +
> + fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> +
> + /* Fix loop's exit probability after scaling. */
> +
> + if (entry_count.initialized_p () && entry_count.nonzero_p ())
> + {
> + edge exit_to_latch1;
> + if (EDGE_SUCC (exit1->src, 0) == exit1)
> + exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> + else
> + exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> + if (exit1->src->count.nonzero_p ())
> + {
> + /* First loop is entered loop1_prob * entry_count times
> + and it needs to exit the same number of times. */
> + exit1->probability
> + = entry_count.apply_probability
> + (loop1_prob).probability_in
> (exit1->src->count);
> + exit_to_latch1->probability = exit1->probability.invert ();
> + scale_dominated_blocks_in_loop (loop1, exit1->src,
> + exit_to_latch1->count (),
> + exit_to_latch1->dest->count);
> + }
> +
> + edge exit_to_latch2, exit2 = single_exit (loop2);
> + if (EDGE_SUCC (exit2->src, 0) == exit2)
> + exit_to_latch2 = EDGE_SUCC (exit2->src, 1);
> + else
> + exit_to_latch2 = EDGE_SUCC (exit2->src, 0);
> + if (exit2->src->count.nonzero_p ())
> + {
> + /* Second loop is entered very_likely * entry_count times
> + and it needs to exit the same number of times. */
> + exit2->probability
> + = entry_count.apply_probability
> + (profile_probability::very_likely ())
> + .probability_in (exit2->src->count);
> + exit_to_latch2->probability = exit2->probability.invert ();
> + scale_dominated_blocks_in_loop (loop2, exit2->src,
> + exit_to_latch2->count (),
> + exit_to_latch2->dest->count);
> + }
> + }
>
> edge new_e = connect_loops (loop1, loop2);
> connect_loop_phis (loop1, loop2, new_e);
> @@ -647,16 +712,7 @@ split_loop (class loop *loop1)
> tree guard_next = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge
> (loop1));
> patch_loop_exit (loop1, guard_stmt, guard_next, newend, initial_true);
>
> - fix_loop_bb_probability (loop1, loop2, true_edge, false_edge);
> -
> - /* Fix first loop's exit probability after scaling. */
> - edge exit_to_latch1;
> - if (EDGE_SUCC (exit1->src, 0) == exit1)
> - exit_to_latch1 = EDGE_SUCC (exit1->src, 1);
> - else
> - exit_to_latch1 = EDGE_SUCC (exit1->src, 0);
> - exit_to_latch1->probability *= true_edge->probability;
> - exit1->probability = exit_to_latch1->probability.invert ();
> + /* TODO: Update any_esitmate and upper bounds. */
>
> /* Finally patch out the two copies of the condition to be always
> true/false (or opposite). */