Dear Teresa, On 19 August 2013 19:47, Teresa Johnson <tejohn...@google.com> wrote: > On Mon, Aug 19, 2013 at 8:09 AM, Jan Hubicka <hubi...@ucw.cz> wrote: >>> Remember it isn't using dominance anymore. The latest patch was >>> instead ensuring the most frequent path between hot blocks and the >>> entry/exit are marked hot. That should be better than the dominance >>> approach used in the earlier version. >> >> Indeed, that looks more resonable approach. >> Can you point me to the last version of patch? Last one I remember still >> walked dominators... > > I've included the latest patch below. I still use dominators in the > post-cfg-optimization fixup (fixup_partitions), but not in the > partition sanitizing done during the partitioning itself > (sanitize_hot_paths). The former is looking for hot bbs newly > dominated by cold bbs after cfg transformations. > >>> >>> > We can commit it and work on better solution incrementally but it will >>> > probably mean replacing it later. If you think it makes things easier >>> > to work on it incrementally, I think the patch is OK. >>> >>> Yes, I think this is a big step forward from what is there now for >>> splitting, which does the splitting purely based on bb count in >>> isolation. I don't have a much better solution in mind yet. >>> >>> >> >>> >> > - I'll try building and profiling gimp myself to see if I can >>> >> > reproduce the issue with code executing out of the cold section. >>> >> >>> >> I have spent some time this week trying to get the latest gimp Martin >>> >> pointed me to configured and built, but it took awhile to track down >>> >> and configure/build all of the required versions of dependent >>> >> packages. I'm still hitting some issues trying to get it compiled, so >>> >> it may not yet be configured properly. I'll take a look again early >>> >> next week. >>> > >>> > I do not think there is anything special about gimp. You can probably >>> > take any other bigger app, like GCC itself. With profiledbootstrap >>> > and linker script to lock unlikely section you should get ICEs where >>> > we jump into cold secton and should not. >>> >>> Ok, please point me to the linker script and I will try gcc >>> profiledbootstrap as well. I wanted to try gimp if possible as I >>> haven't seen this much jumping to the cold section in some of the >>> internal apps I tried. >> >> You can also discuss with Martin the systemtap script to plot disk accesses >> during the startup. It is very handy for analyzing the code layout issues
I send you as an attachment linker script that preserves .text.unlikely, .text.exit, .text.startup and .text.hot sections. I tries to modify memory access flags for .text.unlikely section to read-only, but I am not so familiar with linker scripting. Maybe, you can define with MEMORY command the read-only memory area for .text.unlikely. I am using for my graphing disabled read-ahead function in the Linux kernel with systemtap script and the results are presented with matplotlib library. If you are more interested in this tool-chain, write me and I can send you these scripts. I was also using valgrind to trace functions, so I will try to get locations of called functions to identify which functions are called in corresponding sections. Martin > Ok. I am using linux perf to collect this info (fed through some > scripts that munge and plot the data). > >> >> It may be interesting to get similar script taking traces from valgrind >> and ploting the most frequent calls in the final layout ;) > > I think linux perf -g to get a callgraph should give similar data. > > Teresa > >> >> Honza > > 2013-08-05 Teresa Johnson <tejohn...@google.com> > Steven Bosscher <ste...@gcc.gnu.org> > > * cfgrtl.c (fixup_new_cold_bb): New routine. > (commit_edge_insertions): Invoke fixup_partitions. > (find_partition_fixes): New routine. > (fixup_partitions): Ditto. > (verify_hot_cold_block_grouping): Update comments. > (rtl_verify_edges): Invoke find_partition_fixes. > (rtl_verify_bb_pointers): Update comments. > (rtl_verify_bb_layout): Ditto. > * basic-block.h (fixup_partitions): Declare. > * cfgcleanup.c (try_optimize_cfg): Invoke fixup_partitions. > * bb-reorder.c (sanitize_hot_paths): New function. > (find_rarely_executed_basic_blocks_and_crossing_edges): Invoke > sanitize_hot_paths. > > Index: cfgrtl.c > =================================================================== > --- cfgrtl.c (revision 201461) > +++ cfgrtl.c (working copy) > @@ -1341,6 +1341,43 @@ fixup_partition_crossing (edge e) > } > } > > +/* Called when block BB has been reassigned to the cold partition, > + because it is now dominated by another cold block, > + to ensure that the region crossing attributes are updated. */ > + > +static void > +fixup_new_cold_bb (basic_block bb) > +{ > + edge e; > + edge_iterator ei; > + > + /* This is called when a hot bb is found to now be dominated > + by a cold bb and therefore needs to become cold. Therefore, > + its preds will no longer be region crossing. Any non-dominating > + preds that were previously hot would also have become cold > + in the caller for the same region. Any preds that were previously > + region-crossing will be adjusted in fixup_partition_crossing. */ > + FOR_EACH_EDGE (e, ei, bb->preds) > + { > + fixup_partition_crossing (e); > + } > + > + /* Possibly need to make bb's successor edges region crossing, > + or remove stale region crossing. */ > + FOR_EACH_EDGE (e, ei, bb->succs) > + { > + /* We can't have fall-through edges across partition boundaries. > + Note that force_nonfallthru will do any necessary partition > + boundary fixup by calling fixup_partition_crossing itself. */ > + if ((e->flags & EDGE_FALLTHRU) > + && BB_PARTITION (bb) != BB_PARTITION (e->dest) > + && e->dest != EXIT_BLOCK_PTR) > + force_nonfallthru (e); > + else > + fixup_partition_crossing (e); > + } > +} > + > /* Attempt to change code to redirect edge E to TARGET. Don't do that on > expense of adding new instructions or reordering basic blocks. > > @@ -1979,6 +2016,14 @@ commit_edge_insertions (void) > { > basic_block bb; > > + /* Optimization passes that invoke this routine can cause hot blocks > + previously reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification below to fail, > + and lead to now cold code in the hot section. In some cases this > + may only be visible after newly unreachable blocks are deleted, > + which will be done by fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > verify_flow_info (); > #endif > @@ -2173,6 +2218,101 @@ get_last_bb_insn (basic_block bb) > return end; > } > > +/* Sanity check partition hotness to ensure that basic blocks in > + the cold partition don't dominate basic blocks in the hot partition. > + If FLAG_ONLY is true, report violations as errors. Otherwise > + re-mark the dominated blocks as cold, since this is run after > + cfg optimizations that may make hot blocks previously reached > + by both hot and cold blocks now only reachable along cold paths. */ > + > +static vec<basic_block> > +find_partition_fixes (bool flag_only) > +{ > + basic_block bb; > + vec<basic_block> bbs_in_cold_partition = vNULL; > + vec<basic_block> bbs_to_fix = vNULL; > + > + /* Callers check this. */ > + gcc_checking_assert (crtl->has_bb_partition); > + > + FOR_EACH_BB (bb) > + if ((BB_PARTITION (bb) == BB_COLD_PARTITION)) > + bbs_in_cold_partition.safe_push (bb); > + > + if (bbs_in_cold_partition.is_empty ()) > + return vNULL; > + > + bool dom_calculated_here = !dom_info_available_p (CDI_DOMINATORS); > + > + if (dom_calculated_here) > + calculate_dominance_info (CDI_DOMINATORS); > + > + while (! bbs_in_cold_partition.is_empty ()) > + { > + bb = bbs_in_cold_partition.pop (); > + /* Any blocks dominated by a block in the cold section > + must also be cold. */ > + basic_block son; > + for (son = first_dom_son (CDI_DOMINATORS, bb); > + son; > + son = next_dom_son (CDI_DOMINATORS, son)) > + { > + /* If son is not yet cold, then mark it cold here and > + enqueue it for further processing. */ > + if ((BB_PARTITION (son) != BB_COLD_PARTITION)) > + { > + if (flag_only) > + error ("non-cold basic block %d dominated " > + "by a block in the cold partition (%d)", > son->index, bb->index); > + else > + BB_SET_PARTITION (son, BB_COLD_PARTITION); > + bbs_to_fix.safe_push (son); > + bbs_in_cold_partition.safe_push (son); > + } > + } > + } > + > + if (dom_calculated_here) > + free_dominance_info (CDI_DOMINATORS); > + > + return bbs_to_fix; > +} > + > +/* Perform cleanup on the hot/cold bb partitioning after optimization > + passes that modify the cfg. */ > + > +void > +fixup_partitions (void) > +{ > + basic_block bb; > + > + if (!crtl->has_bb_partition) > + return; > + > + /* Delete any blocks that became unreachable and weren't > + already cleaned up, for example during edge forwarding > + and convert_jumps_to_returns. This will expose more > + opportunities for fixing the partition boundaries here. > + Also, the calculation of the dominance graph during verification > + will assert if there are unreachable nodes. */ > + delete_unreachable_blocks (); > + > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. > + Fixup any that now violate this requirement, as a result of edge > + forwarding and unreachable block deletion. */ > + vec<basic_block> bbs_to_fix = find_partition_fixes (false); > + > + /* Do the partition fixup after all necessary blocks have been converted to > + cold, so that we only update the region crossings the minimum number of > + places, which can require forcing edges to be non fallthru. */ > + while (! bbs_to_fix.is_empty ()) > + { > + bb = bbs_to_fix.pop (); > + fixup_new_cold_bb (bb); > + } > +} > + > /* Verify, in the basic block chain, that there is at most one switch > between hot/cold partitions. This condition will not be true until > after reorder_basic_blocks is called. */ > @@ -2219,7 +2359,8 @@ verify_hot_cold_block_grouping (void) > /* Perform several checks on the edges out of each block, such as > the consistency of the branch probabilities, the correctness > of hot/cold partition crossing edges, and the number of expected > - successor edges. */ > + successor edges. Also verify that the dominance relationship > + between hot/cold blocks is sane. */ > > static int > rtl_verify_edges (void) > @@ -2382,6 +2523,14 @@ rtl_verify_edges (void) > } > } > > + /* If there are partitions, do a sanity check on them: A basic block in > + a cold partition cannot dominate a basic block in a hot partition. */ > + if (crtl->has_bb_partition && !err) > + { > + vec<basic_block> bbs_to_fix = find_partition_fixes (true); > + err = !bbs_to_fix.is_empty (); > + } > + > /* Clean up. */ > return err; > } > @@ -2515,7 +2664,7 @@ rtl_verify_bb_pointers (void) > and NOTE_INSN_BASIC_BLOCK > - verify that no fall_thru edge crosses hot/cold partition boundaries > - verify that there are no pending RTL branch predictions > - - verify that there is a single hot/cold partition boundary after bbro > + - verify that hot blocks are not dominated by cold blocks > > In future it can be extended check a lot of other stuff as well > (reachability of basic blocks, life information, etc. etc.). */ > @@ -2761,7 +2910,8 @@ rtl_verify_bb_layout (void) > - check that all insns are in the basic blocks > (except the switch handling code, barriers and notes) > - check that all returns are followed by barriers > - - check that all fallthru edge points to the adjacent blocks. */ > + - check that all fallthru edge points to the adjacent blocks > + - verify that there is a single hot/cold partition boundary after bbro */ > > static int > rtl_verify_flow_info (void) > Index: basic-block.h > =================================================================== > --- basic-block.h (revision 201461) > +++ basic-block.h (working copy) > @@ -797,6 +797,7 @@ extern bool contains_no_active_insn_p (const_basic > extern bool forwarder_block_p (const_basic_block); > extern bool can_fallthru (basic_block, basic_block); > extern void emit_barrier_after_bb (basic_block bb); > +extern void fixup_partitions (void); > > /* In cfgbuild.c. */ > extern void find_many_sub_basic_blocks (sbitmap); > Index: cfgcleanup.c > =================================================================== > --- cfgcleanup.c (revision 201461) > +++ cfgcleanup.c (working copy) > @@ -2807,10 +2807,21 @@ try_optimize_cfg (int mode) > df_analyze (); > } > > + if (changed) > + { > + /* Edge forwarding in particular can cause hot blocks > previously > + reached by both hot and cold blocks to become dominated only > + by cold blocks. This will cause the verification > below to fail, > + and lead to now cold code in the hot section. This is not > easy > + to detect and fix during edge forwarding, and in some cases > + is only visible after newly unreachable blocks are deleted, > + which will be done in fixup_partitions. */ > + fixup_partitions (); > + > #ifdef ENABLE_CHECKING > - if (changed) > - verify_flow_info (); > + verify_flow_info (); > #endif > + } > > changed_overall |= changed; > first_pass = false; > Index: bb-reorder.c > =================================================================== > --- bb-reorder.c (revision 201461) > +++ bb-reorder.c (working copy) > @@ -1444,27 +1444,134 @@ fix_up_crossing_landing_pad (eh_landing_pad old_lp > ei_next (&ei); > } > > + > +/* Ensure that all hot bbs are included in a hot path through the > + procedure. This is done by calling this function twice, once > + with WALK_UP true (to look for paths from the entry to hot bbs) and > + once with WALK_UP false (to look for paths from hot bbs to the exit). > + Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs > + to BBS_IN_HOT_PARTITION. */ > + > +static unsigned int > +sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count, > + vec<basic_block> *bbs_in_hot_partition) > +{ > + /* Callers check this. */ > + gcc_checking_assert (cold_bb_count); > + > + /* Keep examining hot bbs while we still have some left to check > + and there are remaining cold bbs. */ > + vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy (); > + while (! hot_bbs_to_check.is_empty () > + && cold_bb_count) > + { > + basic_block bb = hot_bbs_to_check.pop (); > + vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs; > + edge e; > + edge_iterator ei; > + int highest_probability = 0; > + bool found = false; > + > + /* Walk the preds/succs and check if there is at least one already > + marked hot. Keep track of the most frequent pred/succ so that we > + can mark it hot if we don't find one. */ > + FOR_EACH_EDGE (e, ei, edges) > + { > + basic_block reach_bb = walk_up ? e->src : e->dest; > + > + if (e->flags & EDGE_DFS_BACK) > + continue; > + > + if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION) > + { > + found = true; > + break; > + } > + if (e->probability > highest_probability) > + highest_probability = e->probability; > + } > + > + /* If bb is reached by (or reaches, in the case of !WALK_UP) another > hot > + block (or unpartitioned, e.g. the entry block) then it is ok. If > not, > + then the most frequent pred (or succ) needs to be adjusted. In the > + case where multiple preds/succs have the same probability (e.g. a > + 50-50 branch), then both will be adjusted. */ > + if (found) > + continue; > + > + FOR_EACH_EDGE (e, ei, edges) > + { > + if (e->flags & EDGE_DFS_BACK) > + continue; > + if (e->probability < highest_probability) > + continue; > + > + basic_block reach_bb = walk_up ? e->src : e->dest; > + > + /* We have a hot bb with an immediate dominator that is cold. > + The dominator needs to be re-marked hot. */ > + BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION); > + cold_bb_count--; > + > + /* Now we need to examine newly-hot reach_bb to see if it is also > + dominated by a cold bb. */ > + bbs_in_hot_partition->safe_push (reach_bb); > + hot_bbs_to_check.safe_push (reach_bb); > + } > + } > + > + return cold_bb_count; > +} > + > + > /* Find the basic blocks that are rarely executed and need to be moved to > a separate section of the .o file (to cut down on paging and improve > cache locality). Return a vector of all edges that cross. */ > > -static vec<edge> > +static vec<edge> > find_rarely_executed_basic_blocks_and_crossing_edges (void) > { > vec<edge> crossing_edges = vNULL; > basic_block bb; > edge e; > edge_iterator ei; > + unsigned int cold_bb_count = 0; > + vec<basic_block> bbs_in_hot_partition = vNULL; > > /* Mark which partition (hot/cold) each basic block belongs in. */ > FOR_EACH_BB (bb) > { > if (probably_never_executed_bb_p (cfun, bb)) > - BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_COLD_PARTITION); > + cold_bb_count++; > + } > else > - BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + { > + BB_SET_PARTITION (bb, BB_HOT_PARTITION); > + bbs_in_hot_partition.safe_push (bb); > + } > } > > + /* Ensure that hot bbs are included along a hot path from the entry to > exit. > + Several different possibilities may include cold bbs along all paths > + to/from a hot bb. One is that there are edge weight insanities > + due to optimization phases that do not properly update basic block > profile > + counts. The second is that the entry of the function may not be > hot, because > + it is entered fewer times than the number of profile training > runs, but there > + is a loop inside the function that causes blocks within the function to > be > + above the threshold for hotness. This is fixed by walking up from hot > bbs > + to the entry block, and then down from hot bbs to the exit, performing > + partitioning fixups as necessary. */ > + if (cold_bb_count) > + { > + mark_dfs_back_edges (); > + cold_bb_count = sanitize_hot_paths (true, cold_bb_count, > + &bbs_in_hot_partition); > + if (cold_bb_count) > + sanitize_hot_paths (false, cold_bb_count, &bbs_in_hot_partition); > + } > + > /* The format of .gcc_except_table does not allow landing pads to > be in a different partition as the throw. Fix this by either > moving or duplicating the landing pads. */ > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
ld.script
Description: Binary data