Hello Richard: On 17/10/23 2:03 pm, Richard Biener wrote: > On Thu, Oct 12, 2023 at 10:42 AM Ajit Agarwal <aagar...@linux.ibm.com> wrote: >> >> This patch improves code sinking pass to sink statements before call to >> reduce >> register pressure. >> Review comments are incorporated. Synced and modified with latest trunk >> sources. >> >> For example : >> >> void bar(); >> int j; >> void foo(int a, int b, int c, int d, int e, int f) >> { >> int l; >> l = a + b + c + d +e + f; >> if (a != 5) >> { >> bar(); >> j = l; >> } >> } >> >> Code Sinking does the following: >> >> void bar(); >> int j; >> void foo(int a, int b, int c, int d, int e, int f) >> { >> int l; >> >> if (a != 5) >> { >> l = a + b + c + d +e + f; >> bar(); >> j = l; >> } >> } >> >> Bootstrapped regtested on powerpc64-linux-gnu. >> >> Thanks & Regards >> Ajit >> >> tree-ssa-sink: Improve code sinking pass >> >> Currently, code sinking will sink code after function calls. This increases >> register pressure for callee-saved registers. The following patch improves >> code sinking by placing the sunk code before calls in the use block or in >> the immediate dominator of the use blocks. > > The patch no longer does what the description above says. Why you think so. Please let me know. > > More comments below. > >> 2023-10-12 Ajit Kumar Agarwal <aagar...@linux.ibm.com> >> >> gcc/ChangeLog: >> >> PR tree-optimization/81953 >> * tree-ssa-sink.cc (statement_sink_location): Move statements before >> calls. >> (select_best_block): Add heuristics to select the best blocks in the >> immediate post dominator. >> >> gcc/testsuite/ChangeLog: >> >> PR tree-optimization/81953 >> * gcc.dg/tree-ssa/ssa-sink-20.c: New test. >> * gcc.dg/tree-ssa/ssa-sink-21.c: New test. >> --- >> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++++++ >> gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 ++++++++++ >> gcc/tree-ssa-sink.cc | 39 ++++++++++++--------- >> 3 files changed, 56 insertions(+), 17 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >> new file mode 100644 >> index 00000000000..d3b79ca5803 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c >> @@ -0,0 +1,15 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ >> +void bar(); >> +int j; >> +void foo(int a, int b, int c, int d, int e, int f) >> +{ >> + int l; >> + l = a + b + c + d +e + f; >> + if (a != 5) >> + { >> + bar(); >> + j = l; >> + } >> +} >> +/* { dg-final { scan-tree-dump >> {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */ >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >> new file mode 100644 >> index 00000000000..84e7938c54f >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */ >> +void bar(); >> +int j, x; >> +void foo(int a, int b, int c, int d, int e, int f) >> +{ >> + int l; >> + l = a + b + c + d +e + f; >> + if (a != 5) >> + { >> + bar(); >> + if (b != 3) >> + x = 3; >> + else >> + x = 5; >> + j = l; >> + } >> +} >> +/* { dg-final { scan-tree-dump >> {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */ >> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc >> index a360c5cdd6e..95298bc8402 100644 >> --- a/gcc/tree-ssa-sink.cc >> +++ b/gcc/tree-ssa-sink.cc >> @@ -174,7 +174,8 @@ nearest_common_dominator_of_uses (def_operand_p def_p, >> bool *debug_stmts) >> >> /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator >> tree, return the best basic block between them (inclusive) to place >> - statements. >> + statements. The best basic block should be an immediate dominator of >> + best basic block if the use stmt is after the call. >> >> We want the most control dependent block in the shallowest loop nest. >> >> @@ -196,6 +197,16 @@ select_best_block (basic_block early_bb, >> basic_block best_bb = late_bb; >> basic_block temp_bb = late_bb; >> int threshold; >> + /* Get the sinking threshold. If the statement to be moved has memory >> + operands, then increase the threshold by 7% as those are even more >> + profitable to avoid, clamping at 100%. */ >> + threshold = param_sink_frequency_threshold; >> + if (gimple_vuse (stmt) || gimple_vdef (stmt)) >> + { >> + threshold += 7; >> + if (threshold > 100) >> + threshold = 100; >> + } >> >> while (temp_bb != early_bb) >> { >> @@ -204,6 +215,14 @@ select_best_block (basic_block early_bb, >> if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb)) >> best_bb = temp_bb; >> >> + /* if we have temp_bb post dominated by use block block then immediate >> + * dominator would be our best block. */ >> + if (!gimple_vuse (stmt) >> + && bb_loop_depth (temp_bb) == bb_loop_depth (early_bb) >> + && !(temp_bb->count * 100 >= early_bb->count * threshold) >> + && dominated_by_p (CDI_DOMINATORS, late_bb, temp_bb)) > > this isn't a post-dominance check, in fact this always returns true. This > also overrides the best found loop depth which probably means finding > both inside the same loop doesn't work.
I can remove dominated check. You would like me to do in different loop than doing inside the same loop. Please let me know. > What's the intent of the change? The purpose of this change is to assign best_bb the immediate dominator if both early_bb and temp_bb have same loop depth. Thanks & Regards Ajit > >> + best_bb = temp_bb; >> + >> /* Walk up the dominator tree, hopefully we'll find a shallower >> loop nest. */ >> temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb); >> @@ -233,17 +252,6 @@ select_best_block (basic_block early_bb, >> && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, >> best_bb)) >> return early_bb; >> >> - /* Get the sinking threshold. If the statement to be moved has memory >> - operands, then increase the threshold by 7% as those are even more >> - profitable to avoid, clamping at 100%. */ >> - threshold = param_sink_frequency_threshold; >> - if (gimple_vuse (stmt) || gimple_vdef (stmt)) >> - { >> - threshold += 7; >> - if (threshold > 100) >> - threshold = 100; >> - } >> - >> /* If BEST_BB is at the same nesting level, then require it to have >> significantly lower execution frequency to avoid gratuitous movement. >> */ >> if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb) >> @@ -430,6 +438,7 @@ statement_sink_location (gimple *stmt, basic_block >> frombb, >> continue; >> break; >> } >> + >> use = USE_STMT (one_use); >> >> if (gimple_code (use) != GIMPLE_PHI) >> @@ -439,10 +448,7 @@ statement_sink_location (gimple *stmt, basic_block >> frombb, >> if (sinkbb == frombb) >> return false; >> >> - if (sinkbb == gimple_bb (use)) >> - *togsi = gsi_for_stmt (use); >> - else >> - *togsi = gsi_after_labels (sinkbb); >> + *togsi = gsi_after_labels (sinkbb); >> >> return true; >> } >> @@ -825,7 +831,6 @@ pass_sink_code::execute (function *fun) >> mark_dfs_back_edges (fun); >> memset (&sink_stats, 0, sizeof (sink_stats)); >> calculate_dominance_info (CDI_DOMINATORS); >> - >> virtual_operand_live vop_live; >> >> int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun)); >> -- >> 2.39.3 >>