Re: [patch tree-optimization 1/2]: Branch-cost optimizations

Kai Tietz Mon, 07 Nov 2011 07:54:05 -0800

2011/11/7 Richard Guenther <richard.guent...@gmail.com>:
> On Sun, Nov 6, 2011 at 11:12 PM, Kai Tietz <kti...@redhat.com> wrote:
>> Hello,
>>
>> By this patch branch-cost optimization is moved from tree AST to cfgexpand 
>> from gimple to RTL.  By this we are able to do better optimization on 
>> conditionals simliar for all targets and do the final transition for 
>> branch-cost that late it shows best effect.
>>
>> This patch is splitted up into two pieces.  First adds feature of BC 
>> optimization to cfgexpand and scratches out BC-optimization code from 
>> fold-const.
>>
>> The second patch adds to tree-ssa-ifcombine pass the feature to merge simple 
>> if-and/or-if patterns into associative form.
>>
>> Two tests are failing due this patch in C's testsuite.  This are 
>> unit-pred-6_b and uniit-pred-6_c testcases.  Those failures are caused by 
>> jump-threading optimization in vrp, as vrp-pass.  Those failures could be 
>> fixed by the second patch, if we would move the ifcombine pass before the 
>> first vrp pass.
>
> The idea of doing target cost specific conversion of straight-line conditional
> code to branchy code and vice versa is a good one.  Though technically
> the spot you chose isn't very suitable - we already performed SSA name
> coalescing so you do not have the freedom that you seem to exercise
> with using SSA name definition statements.


Well, not that I noticed that I missed here any freedom.  In what
cases you mean I would need here freedom to create new ssa-statements?
 I admit that to move this branching code into a late pass before
cfgexpand would be preferable, but due current way cfgexpand already
interacts here with dojump, it seems to me for now the better place.
So for the upcoming 4.8 I will happily work on a more improved
version, which can handle the switch-lowering, too.
The current approach I've posted here shows already (first patch only)
an overall lowering of instructions executed for first condition, and
a pretty good lowering of executed instruction for extra dynamic
conditions.  As the choosen algorithm here isn't pretty aggressive in
finding matches, there should be indeed much chance to improve it
more.

The real pain is here already in AST, as for doing associative logical
bitwise combining we need to know pretty well that condition-elements
don't trap and have no side-effects.  This is right now only possible
by walking full tree, which is (as my tests have shown) even faster
then I expected.

> Rather than hooking the logic into the place where we expand conditions
> this should be a "lowering" pass right before expansion (that would,
> hopefully at some point also do switch statement expansion/lowering
> according to target costs).  Initially such patch should try to exactly
> copy what we do during expansion right now.

I agree, this should be the place it should go to for upcoming version.

> The cond_chain stuff should be as we have discussed quite some time
> ago on IRC - modeled after tree-affine.c - a "vector" of predicates
> of the form [~] A op B, combined using logical AND or OR
> (thus, able to represent a disjunctive or conjunctive normal form).
> All suitably abstracted so if-conversion, the loop optimizer and
> loop number of iteration analysis can use this code (they all collect
> a controlling predicate (chain) and try to simplify against it).

Yes, I remember this discussion.  And I agree that an affine-tree for
this could help at some places to share same facility.  Issue here is,
as we also were discussing, that a affine tree of not and ops for
bitwise-binaries is for truth-op folding not enough.  We need to
handle here comparsions and have to detect trapping, and side-effects.
As I said above for upcoming 4.8 I will work on that.

> A patch adding a pre-expand pass should not at the same time
> change what we feed it - what we have before such pass in the IL
> should be driven by choice what canonical form is more optimal
> for optimization passes (and unfortunately we have conflicting
> requirements here - think of profile-feedback and the fact that
> we do not have SSA names for each predicate we compute).

Well, this logic is in the new code for cfgexpand done. It assumes
that each instruction has cost of 1.  So it collects the amount of
required instruction required for the conditional check and sees if it
might be profitable to join it with second operand (if there is one).
This logic is much more sensible in terms of executed instruction that
what we have now in fold-const, where we are blindly joining
associative bitwise-operations, even if it is more costy then the
separate condition.
So by this patch we already have a code-reduction of about 0.3% and
for each extra-dynamical branch a much higher reduction of executed
instruction as current approach shows.

> I note that there are still more patches from you pending related
> to predicate optimizations - but I lack an idea where you are
> heading to, from a global point of view.  Take the time that
> stage3 and pre-stage1 offers to lay ground for a more structured
> approach to this area.  You may, for example, want to pick up
> my (suspended) work on separating predicate computation from
> predicate use (or at least think about whether that's a good idea
> for the purpose of predicate optimizations).

Well, the idea behind this is, that we use for general bitwise-binary
optimizations (logical and non-logical) operation one pass which
already handles it.
We might should have a reassoc pass much earlier then now.

> Thanks,
> Richard.
>
>> ChangeLog
>>
>> 2011-11-06  Kai Tietz  <kti...@redhat.com>
>>
>>        * cfgexpand.c (is_bool_op_p): New helper.
>>        (normalize_truth_condition): Likewise.
>>        (cond_assoc_t): New structure type.
>>        (collect_cond_chain): New helper.
>>        (build_cond_expr): Likewise.
>>        (is_bitwise_binary_simple_combine): Likewise.
>>        (preeval_cond_integral): Likewise.
>>        (combine_conds): Likewise.
>>        (branchcost_optimization_on_conditions): Likewise.
>>        (expand_gimple_cond): Use branchcost_optimization_on_condition
>>        function.
>>        * dojump.c (do_jump): Prevent converting bitwise-and/or
>>        to real iffs for branch-cost bigger then zero.
>>        * fold_const.c (simple_operand_p_2): Improve evaluation
>>        of side-effects and trapping for associative truth-bitwise
>>        binary operations.
>>        (fold_range_test): Remove branch-cost specific code.
>>        (fold_truth_andor_1): Likewise.
>>        (fold_truth_andor): Likewise.
>>
>> ChangeLog testsuite
>>
>> 2011-11-06  Kai Tietz  <kti...@redhat.com>
>>
>>        * gcc.dg/pr46909.c: Adjust test.
>>        * gcc.dg/tree-ssa/vrp33.c: Likewise.
>>        * gcc.target/i386/branch-cost1.c: Likewise.
>>        * gcc.target/i386/branch-cost2.c: Likewise.
>>        * gcc.target/i386/branch-cost3.c: Likewise.
>>        * gcc.target/i386/branch-cost4.c: Likewise.
>>
>> Patch was bootstrapped and regression tested for x86_64-unknown-linux-gnu.  
>> Ok for apply?
>>
>



-- 
|  (\_/) This is Bunny. Copy and paste
| (='.'=) Bunny into your signature to help
| (")_(") him gain world domination

Re: [patch tree-optimization 1/2]: Branch-cost optimizations

Reply via email to