I stumbled across the following and was interested in others thoughts
before I proceed any further.
The algorithm described in the comments of bb-reorder.c (and the paper
cited) talk about two parameters for controlling which blocks will be added
to traces. "Branch threshhold" refers to the branch probabilty of the
successor edges for a given block and "Exec threshhold" refers to the
execution frequency of the blocks. The problem is that the code in
find_traces_1_round() is grabbing the "EDGE_FREQUENCY" of a successor edge
for the frequency instead of the successor block's frequency. Since edge
frequency is based off the source block's frequency (src blk freq * edge
prob), this means the code in better_edge_p() which compares frequencies if
edge probabilities are equivalent is useless (if probabilities are
equivalent then the edge frequencies will also be equivalent since they're
based off the same block frequency).
I made the obvious change to use the succ blk frequency instead of edge
frequency and noticed one situation which gave undesirable results with the
current code. When we have a test block gating whether a loop should be
entered, the new block frequency check causes the code to pick the non-loop
path as the next block to add to the trace since the loop header block has
a higher frequency, and hence the loop gets moved out of line.
A thought I had for fixing this was to use dominance information in
better_edge_p() as follows:
else if (freq < best_freq - diff_freq)
/* The edge and the temporary best edge have almost equivalent
probabilities. The higher frequency of a successor now means
that there is another edge going into that successor.
This successor has lower frequency so it is better. */
// is_better_edge = true;
is_better_edge = ! dominated_by_p (CDI_POST_DOMINATORS, e->dest,
cur_best_edge->dest);
else if (freq > best_freq + diff_freq)
/* This successor has higher frequency so it is worse. */
// is_better_edge = false;
is_better_edge = dominated_by_p (CDI_POST_DOMINATORS,
cur_best_edge->dest, e->dest);
This gives the desired effect for both cases, not picking the higher
frequency succ block when that additional frequency is due to some other
predecessor, and leaving loops in line. An additional benefit of this
approach is that it should render useless the existing special case code in
find_traces_1_round() which looks for triangles in the flow graph to make
sure "then" legs are emitted in line.
Feedback I'm interested in hearing:
1) Flaws in my thinking/approach?
2) Thoughts on cost of computing dominance info for this
Thanks,
Pat