2011/11/7 Richard Guenther <richard.guent...@gmail.com>: > On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <kti...@redhat.com> wrote: >> Hello, >> >> By this patch branch-cost optimization is moved from tree AST to cfgexpand >> from gimple to RTL. By this we are able to do better optimization on >> conditionals simliar for all targets and do the final transition for >> branch-cost that late it shows best effect. >> >> This patch is splitted up into two pieces. First adds feature of BC >> optimization to cfgexpand and scratches out BC-optimization code from >> fold-const. >> >> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple >> if-and/or-if patterns into associative form. >> >> Two tests are failing due this patch in C's testsuite. This are >> unit-pred-6_b and uniit-pred-6_c testcases. Those failures are caused by >> jump-threading optimization in vrp, as vrp-pass. Those failures could be >> fixed by the second patch, if we would move the ifcombine pass before the >> first vrp pass. > > The idea of doing target cost specific conversion of straight-line conditional > code to branchy code and vice versa is a good one. Though technically > the spot you chose isn't very suitable - we already performed SSA name > coalescing so you do not have the freedom that you seem to exercise > with using SSA name definition statements.
Well, not that I noticed that I missed here any freedom. In what cases you mean I would need here freedom to create new ssa-statements? I admit that to move this branching code into a late pass before cfgexpand would be preferable, but due current way cfgexpand already interacts here with dojump, it seems to me for now the better place. So for the upcoming 4.8 I will happily work on a more improved version, which can handle the switch-lowering, too. The current approach I've posted here shows already (first patch only) an overall lowering of instructions executed for first condition, and a pretty good lowering of executed instruction for extra dynamic conditions. As the choosen algorithm here isn't pretty aggressive in finding matches, there should be indeed much chance to improve it more. The real pain is here already in AST, as for doing associative logical bitwise combining we need to know pretty well that condition-elements don't trap and have no side-effects. This is right now only possible by walking full tree, which is (as my tests have shown) even faster then I expected. > Rather than hooking the logic into the place where we expand conditions > this should be a "lowering" pass right before expansion (that would, > hopefully at some point also do switch statement expansion/lowering > according to target costs). Initially such patch should try to exactly > copy what we do during expansion right now. I agree, this should be the place it should go to for upcoming version. > The cond_chain stuff should be as we have discussed quite some time > ago on IRC - modeled after tree-affine.c - a "vector" of predicates > of the form [~] A op B, combined using logical AND or OR > (thus, able to represent a disjunctive or conjunctive normal form). > All suitably abstracted so if-conversion, the loop optimizer and > loop number of iteration analysis can use this code (they all collect > a controlling predicate (chain) and try to simplify against it). Yes, I remember this discussion. And I agree that an affine-tree for this could help at some places to share same facility. Issue here is, as we also were discussing, that a affine tree of not and ops for bitwise-binaries is for truth-op folding not enough. We need to handle here comparsions and have to detect trapping, and side-effects. As I said above for upcoming 4.8 I will work on that. > A patch adding a pre-expand pass should not at the same time > change what we feed it - what we have before such pass in the IL > should be driven by choice what canonical form is more optimal > for optimization passes (and unfortunately we have conflicting > requirements here - think of profile-feedback and the fact that > we do not have SSA names for each predicate we compute). Well, this logic is in the new code for cfgexpand done. It assumes that each instruction has cost of 1. So it collects the amount of required instruction required for the conditional check and sees if it might be profitable to join it with second operand (if there is one). This logic is much more sensible in terms of executed instruction that what we have now in fold-const, where we are blindly joining associative bitwise-operations, even if it is more costy then the separate condition. So by this patch we already have a code-reduction of about 0.3% and for each extra-dynamical branch a much higher reduction of executed instruction as current approach shows. > I note that there are still more patches from you pending related > to predicate optimizations - but I lack an idea where you are > heading to, from a global point of view. Take the time that > stage3 and pre-stage1 offers to lay ground for a more structured > approach to this area. You may, for example, want to pick up > my (suspended) work on separating predicate computation from > predicate use (or at least think about whether that's a good idea > for the purpose of predicate optimizations). Well, the idea behind this is, that we use for general bitwise-binary optimizations (logical and non-logical) operation one pass which already handles it. We might should have a reassoc pass much earlier then now. > Thanks, > Richard. > >> ChangeLog >> >> 2011-11-06 Kai Tietz <kti...@redhat.com> >> >> * cfgexpand.c (is_bool_op_p): New helper. >> (normalize_truth_condition): Likewise. >> (cond_assoc_t): New structure type. >> (collect_cond_chain): New helper. >> (build_cond_expr): Likewise. >> (is_bitwise_binary_simple_combine): Likewise. >> (preeval_cond_integral): Likewise. >> (combine_conds): Likewise. >> (branchcost_optimization_on_conditions): Likewise. >> (expand_gimple_cond): Use branchcost_optimization_on_condition >> function. >> * dojump.c (do_jump): Prevent converting bitwise-and/or >> to real iffs for branch-cost bigger then zero. >> * fold_const.c (simple_operand_p_2): Improve evaluation >> of side-effects and trapping for associative truth-bitwise >> binary operations. >> (fold_range_test): Remove branch-cost specific code. >> (fold_truth_andor_1): Likewise. >> (fold_truth_andor): Likewise. >> >> ChangeLog testsuite >> >> 2011-11-06 Kai Tietz <kti...@redhat.com> >> >> * gcc.dg/pr46909.c: Adjust test. >> * gcc.dg/tree-ssa/vrp33.c: Likewise. >> * gcc.target/i386/branch-cost1.c: Likewise. >> * gcc.target/i386/branch-cost2.c: Likewise. >> * gcc.target/i386/branch-cost3.c: Likewise. >> * gcc.target/i386/branch-cost4.c: Likewise. >> >> Patch was bootstrapped and regression tested for x86_64-unknown-linux-gnu. >> Ok for apply? >> > -- | (\_/) This is Bunny. Copy and paste | (='.'=) Bunny into your signature to help | (")_(") him gain world domination