I'm not entirely sure how gcc's CFG structure all fits together yet, so I'll ask for some input on this one:
While looking through some dumps from a compile using -fprofile-use, I noticed the following in the "jump" dump file: Basic block 164 prev 163, next -2, loop_depth 0, count 1672, freq 1482. Predecessors: 163 [100.0%] count:836 (fallthru) Successors: EXIT [100.0%] count:1672 (fallthru) Invalid sum of incoming frequencies 741, should be 1482 Invalid sum of incoming counts 836, should be 1672 I decided to try and figure out why the sum of incoming frequencies was invalid. I've found where the mistake is introduced, and think I know what needs to be done to fix it, but perhaps someone can confirm or suggest an alternative. Here are the last few blocks dumped just before cfgbuild.find_bb_boundaries. The counts look good at this point (note especially block 162). Basic block 153 prev 152, next 154, loop_depth 0, count 836, freq 741. Predecessors: 0 [27.8%] count:232 12 151 [100.0%] count:604 152 [100.0%] (fallthru) Successors: 162 [100.0%] count:836 Basic block 154 prev 153, next 155, loop_depth 1, count 195, freq 173, probably never executed. Predecessors: 72 [8.1%] count:36 74 [39.0%] count:159 73 159 [100.0%] Successors: 78 [66.7%] count:130 155 [33.3%] count:65 (fallthru) Basic block 155 prev 154, next 162, loop_depth 0, count 65, freq 58, probably never executed. Predecessors: 154 [33.3%] count:65 (fallthru) Successors: 79 [100.0%] count:65 Basic block 162 prev 155, next -2, loop_depth 0, count 836, freq 741. Predecessors: 153 [100.0%] count:836 Successors: EXIT [100.0%] count:836 (fallthru) ------------- After cfgbuild.find_bb_boundaries, there are a few new blocks, and block 162 has been disconnected from the graph. Block 162 has no predecessors, yet it's count remains at 836 and it's frequency at 741 which are invalid. I believe blocks 163 and 164 were created while splitting block 162, and thus inherited the same count and frequency. Basic block 153 prev 152, next 154, loop_depth 0, count 836, freq 741. Predecessors: 0 [27.8%] count:232 12 151 [100.0%] count:604 152 [100.0%] (fallthru) Successors: Basic block 154 prev 153, next 155, loop_depth 1, count 195, freq 173, probably never executed. Predecessors: 72 [8.1%] count:36 74 [39.0%] count:159 73 159 [100.0%] Successors: 78 [66.7%] count:130 155 [33.3%] count:65 (fallthru) Basic block 155 prev 154, next 162, loop_depth 0, count 65, freq 58, probably never executed. Predecessors: 154 [33.3%] count:65 (fallthru) Successors: 79 [100.0%] count:65 Basic block 162 prev 155, next 163, loop_depth 0, count 836, freq 741. Predecessors: Successors: Invalid sum of incoming frequencies 0, should be 741 Invalid sum of incoming counts 0, should be 836 Basic block 163 prev 162, next 164, loop_depth 0, count 836, freq 741. Predecessors: Successors: Invalid sum of incoming frequencies 0, should be 741 Invalid sum of incoming counts 0, should be 836 Basic block 164 prev 163, next -2, loop_depth 0, count 836, freq 741. Predecessors: Successors: EXIT [100.0%] count:836 (fallthru) Invalid sum of incoming frequencies 0, should be 741 Invalid sum of incoming counts 0, should be 836 ------------- Later on, the cfg has apparently been rebuilt. Some of the blocks with bad counts were reattached to the graph and their counts propagated. The result is the invalid block count I initially observed. Note here that the count and frequency for block 164 are exactly double what they should be. Basic block 159 prev 158, next 160, loop_depth 0, count 836, freq 741. Predecessors: 1 [27.8%] count:232 13 157 [100.0%] count:604 158 [100.0%] (fallthru) Successors: 163 [100.0%] count:836 Basic block 160 prev 159, next 161, loop_depth 1, count 195, freq 173, probably never executed. Predecessors: 75 [8.1%] count:36 77 [39.0%] count:159 76 79 [100.0%] Successors: 82 [66.7%] count:130 161 [33.3%] count:65 (fallthru) Basic block 161 prev 160, next 163, loop_depth 0, count 65, freq 58, probably never executed. Predecessors: 160 [33.3%] count:65 (fallthru) Successors: 83 [100.0%] count:65 Basic block 163 prev 161, next 164, loop_depth 0, count 836, freq 741. Predecessors: 159 [100.0%] count:836 Successors: 164 [100.0%] count:836 (fallthru) Basic block 164 prev 163, next -2, loop_depth 0, count 1672, freq 1482. Predecessors: 163 [100.0%] count:836 (fallthru) Successors: EXIT [100.0%] count:1672 (fallthru) Invalid sum of incoming frequencies 741, should be 1482 Invalid sum of incoming counts 836, should be 1672 ------------ The problem appears to be in the function cfgrtl.purge_dead_edges which is called by find_bb_boundaries. There are a couple of cases where "remove_edge" is called to remove an edge from the graph. In the above example, the edge being removed is the only predecessor edge of the destination block(162). The edge is removed, but the count and frequency of the now orphaned destination block(162) are not adjusted. When these orphaned blocks later get reconnected to the graph (the part I'm not clear on :-) .. their bad counts get incrementally added back to the flow of the graph. I experimented with adjusting the count and frequency of the destination block prior to removing the edge in purge_dead_edges and that seems to solve the problem. I keep wondering if there is a more generic way the counts should be adjusted when an edge is removed.. for example in the remove_edge code itself. But doing it there seems to break other things. Any thoughts are appreciated. Pete.