Although my analysis of the i386 mode switching in PR 28764 was incorrect (see PR target/29347), in my analysis in commnent #13 of PR target/28764 still stands with regard to superflous/unnecessarily repeated computation that we do in the mode switching code, and additional optimizations that could be done:
- make_preds_opaque is not needed. - the pre_edge_lcm computation for all modes of an entity can be done simultanously. Since pre_edge_lcm will not insert additional computations into a path, and different modes of the same entity kill each other, it won't result in multiple modes of the same entity being live at the same point. Hence, a single pre_edge_lcm call is sufficient. - The SH fpchg is currently not used. What we are missing is knowledge when we have a transition from one known mode to another. We can get this by adding new array pointer pointer parameters to pre_edge_lcm so that at the end, if these parameters are non-zero, instead of freeing the avin / avout arrays, it will assign the pointers to these arrays to the pointers pointed to be the new parameters. avin is useful for exception handlers that nominally start out in no_mode. If avin is set for the same mode as for the mode required at the start of an exception handler, the mode setting can be elided; this can be done in the machine-dependent code. If avin is set for the opposite mode, fpchg can be used. This can be implemented by making the machine-independent code pass any mode known from the matching avin as a parameter to EMIT_MODE_SET. Likewise, avout can be used to determine any known mode for an edge insertion. - while pre_edge_lcm won't insert calculations in paths where they have not been originally, doing so can be beneficial considering the probability of paths and the different costs of set from a known mode as compared to a set from an unknown mode. For any transparent sub-graph, considering a mode anticipatable in this graph is beneficial if the sum of the frequencies of outgoing edges on which the mode is anticipatable, times the cost of setting from an unknown mode, plus the sum of the frequencies of outgoing edges on which any other mode is anticipatable, times the cost savings when setting from a known mode rather than an unknown mode, is larger than the sum of the products of frequencies of incoming edges on which the expression is not available, time the cost of setting the mode on this edge (considering any other available mode known there). So the main problem for a code motion pass that takes edge frequencies into account appears to be to find heuristics how to identify transparent subgraphs which should be checked - a naiive algorithm to try all subsets of nodes would be exponential. On the other hand, checking only subgraphs which are interconnected as a list and have no connection to other transparent nodes except in head and tail, and could be expanded at their tail (alternatively: at their head) only in the form of a longer list, can be done with a greedy algorithm which checks only O(number of nodes) nodes. (Traverse all nodes to find nodes which are tail / head of maximum lists, drop nodes at the bottom if it increases the tally, for a positive tally, if adding a new node would decrease it, remember current list. When current tally becomes negative or the list can't be extended, check for remembered list.) -- Summary: mode switching is inefficient both at compile time and at run time Product: gcc Version: 4.2.0 Status: UNCONFIRMED Keywords: missed-optimization, compile-time-hog Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: amylaar at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29349