Richard, In my initial design I did such splitting but before start real if-conversion but I decided to not perform it since code size for if-converted loop is growing (number of phi nodes is increased). It is worth noting also that for phi with #nodes > 2 we need to get all predicates (except one) to do phi-predication and it means that block containing such phi can have only 1 critical edge.
Thanks. Yuri. 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I saw the sources of these functions, but I can't understand why I >>> should use something else? Note that all predicate computations are >>> located in basic blocks ( by design of if-conv) and there is special >>> function that put these computations in bb >>> (insert_gimplified_predicates). Edge contains only predicate not its >>> computations. New function - find_insertion_point() does very simple >>> search - it finds out the latest (in current bb) operand def-stmt of >>> predicates taken from all incoming edges. >>> In original algorithm the predicate of non-critical edge is taken to >>> perform phi-node predication since for critical edge it does not work >>> properly. >>> >>> My question is: does your comments mean that I should re-design my >>> extensions? >> >> Well, we have infrastructure for inserting code on edges and you've >> made critical edges predicated correctly. So why re-invent the wheel? >> I realize this is very similar to my initial suggestion to simply split >> critical edges in loops you want to if-convert but delays splitting >> until it turns out to be necessary (which might be good for the >> !force_vect case). >> >> For edge predicates you simply can emit their computation on the >> edge, no? >> >> Btw, I very originally suggested to rework if-conversion to only >> record edge predicates - having both block and edge predicates >> somewhat complicates the code and makes it harder to >> maintain (thus also the suggestion to simply split critical edges >> if necessary to make BB predicates work always). >> >> Your patches add a lot of code and to me it seems we can avoid >> doing so much special casing. > > For example attacking the critical edge issue by a simple > > Index: tree-if-conv.c > =================================================================== > --- tree-if-conv.c (revision 216508) > +++ tree-if-conv.c (working copy) > @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, > if (EDGE_COUNT (e->src->succs) == 1) > found = true; > if (!found) > - { > - if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "only critical predecessors\n"); > - return false; > - } > + split_edge (EDGE_PRED (bb, 0)); > } > > return true; > > it changes the number of blocks in the loop, so > get_loop_body_in_if_conv_order should probably be re-done with the > above eventually signalling that it created a new block. Or the above > should populate a vector of edges to split and do that after the > loop calling if_convertible_bb_p. > > Richard. > >> Richard. >> >>> Thanks. >>> Yuri. >>> >>> BTW Jeff did initial review of my changes related to predicate >>> computation for join blocks. I presented him updated patch with >>> test-case and some minor changes in patch. But still did not get any >>> feedback on it. Could you please take a look also on it? >>> >>> >>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> Yes, This patch does not make sense since phi node predication for bb >>>>> with critical incoming edges only performs another function which is >>>>> absent (predicate_extended_scalar_phi). >>>>> >>>>> BTW I see that commit_edge_insertions() is used for rtx instructions >>>>> only but you propose to use it for tree also. >>>>> Did I miss something? >>>> >>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert >>>> if you want easy access to the newly created basic block to push >>>> the predicate to - see gsi_commit_edge_inserts implementation). >>>> >>>> Richard. >>>> >>>>> Thanks ahead. >>>>> >>>>> >>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> I did some changes in patch and ChangeLog to mark that support for >>>>>>> if-convert of blocks with only critical incoming edges will be added >>>>>>> in the future (more precise in patch.4). >>>>>> >>>>>> But the same reasoning applies to this version of the patch when >>>>>> flag_force_vectorize is true!? (insertion point and invalid SSA form) >>>>>> >>>>>> Which means the patch doesn't make sense in isolation? >>>>>> >>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge () >>>>>> and commit_edge_insertions () before the call to combine_blocks >>>>>> (pushing the edge predicate to the newly created block). >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Could you please review it. >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> ChangeLog: >>>>>>> >>>>>>> 2014-10-21 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> (flag_force_vectorize): New variable. >>>>>>> (edge_predicate): New function. >>>>>>> (set_edge_predicate): New function. >>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>>> for critical edge. >>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>> (all_preds_critical_p): New function. >>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>>> to reject temporarily block if-conversion with incoming critical edges >>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted >>>>>>> after adding support for extended predication. >>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>> Add zeroing of edge 'aux' field. >>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>> is equal 2 and at least one incoming edge is not critical original >>>>>>> algorithm is used. >>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. >>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>> >>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>> Richard, >>>>>>>> >>>>>>>> Thanks for your answer! >>>>>>>> >>>>>>>> In current implementation phi node conversion assume that one of >>>>>>>> incoming edge to bb containing given phi has at least one non-critical >>>>>>>> edge and choose it to insert predicated code. But if we choose >>>>>>>> critical edge we need to determine insert point and insertion >>>>>>>> direction (before/after) since in other case we can get invalid ssa >>>>>>>> form (use before def). This is done by my new function which is not in >>>>>>>> current patch ( I will present this patch later). SO I assume that we >>>>>>>> need to leave this patch as it is to not introduce new bugs. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Yuri. >>>>>>>> >>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what >>>>>>>>>> did you mean by: >>>>>>>>>> >>>>>>>>>>>So please rework the patch so critical edges are always handled >>>>>>>>>>>correctly. >>>>>>>>>> >>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes >>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only >>>>>>>>>> critical incoming edges since support for extended predication of phi >>>>>>>>>> nodes will be in next patch. >>>>>>>>> >>>>>>>>> I mean that (2) should not be rejected dependent on >>>>>>>>> flag_force_vectorize. >>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before >>>>>>>>> but with >>>>>>>>> this patch this is fixed. I see no reason to still reject this then >>>>>>>>> even >>>>>>>>> for !flag_force_vectorize. >>>>>>>>> >>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize >>>>>>>>> is ok. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Could you please clarify your statement. >>>>>>>>>> >>>>>>>>>> I attached modified patch. >>>>>>>>>> >>>>>>>>>> ChangeLog: >>>>>>>>>> >>>>>>>>>> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>> (edge_predicate): New function. >>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>>> add_to_predicate_list >>>>>>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>>>>>> for critical edge. >>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>> algorithm is used. >>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener >>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Richard, >>>>>>>>>>>> >>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have been >>>>>>>>>>>> fixed. >>>>>>>>>>>> Could you please look at it ( I have already sent the patch with >>>>>>>>>>>> changes in add_to_predicate_list for review). >>>>>>>>>>> >>>>>>>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>>>>>>> + fprintf (dump_file, "More than two phi node >>>>>>>>>>> args.\n"); >>>>>>>>>>> + return false; >>>>>>>>>>> + } >>>>>>>>>>> + >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> Excess vertical space. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors. >>>>>>>>>>> >>>>>>>>>>> More than 1 predecessor? >>>>>>>>>>> >>>>>>>>>>> + Returns false if at least one successor is not on critical edge >>>>>>>>>>> + and true otherwise. */ >>>>>>>>>>> + >>>>>>>>>>> +static inline bool >>>>>>>>>>> +all_edges_are_critical (basic_block bb) >>>>>>>>>>> +{ >>>>>>>>>>> >>>>>>>>>>> "all_preds_critical_p" would be a better name >>>>>>>>>>> >>>>>>>>>>> + if (EDGE_COUNT (bb->preds) > 2) >>>>>>>>>>> + { >>>>>>>>>>> + if (!flag_force_vectorize) >>>>>>>>>>> + return false; >>>>>>>>>>> + } >>>>>>>>>>> >>>>>>>>>>> as I said in the last review I don't think we should restrict edge >>>>>>>>>>> predicates to flag_force_vectorize. At least I can't see how >>>>>>>>>>> if-conversion is magically more expensive for that case? >>>>>>>>>>> >>>>>>>>>>> So please rework the patch so critical edges are always handled >>>>>>>>>>> correctly. >>>>>>>>>>> >>>>>>>>>>> Ok with that and the above suggested changes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Thanks. >>>>>>>>>>>> Yuri. >>>>>>>>>>>> ChangeLog >>>>>>>>>>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>>>>> add_to_predicate_list >>>>>>>>>>>> if destination block of edge is not always executed. Set-up >>>>>>>>>>>> predicate >>>>>>>>>>>> for critical edge. >>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>> phi-block >>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>> algorithm is used. >>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to >>>>>>>>>>>> false. >>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener >>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Second part of patch will be sent later. >>>>>>>>>>>>> >>>>>>>>>>>>> Ok, I'm starting to look at this. I'd still like you to split >>>>>>>>>>>>> things up >>>>>>>>>>>>> more. >>>>>>>>>>>>> >>>>>>>>>>>>> static inline void >>>>>>>>>>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree >>>>>>>>>>>>> nc) >>>>>>>>>>>>> { >>>>>>>>>>>>> ... >>>>>>>>>>>>> >>>>>>>>>>>>> + /* We use notion of cd equivalence to get simplier >>>>>>>>>>>>> predicate for >>>>>>>>>>>>> + join block, e.g. if join block has 2 predecessors with >>>>>>>>>>>>> predicates >>>>>>>>>>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it instead >>>>>>>>>>>>> of >>>>>>>>>>>>> + p1 & p2 | p1 & !p2. */ >>>>>>>>>>>>> + if (dom_bb != loop->header >>>>>>>>>>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, >>>>>>>>>>>>> dom_bb) == bb) >>>>>>>>>>>>> + { >>>>>>>>>>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>>>>>>>>>> + bc = bb_predicate (dom_bb); >>>>>>>>>>>>> + gcc_assert (!is_true_predicate (bc)); >>>>>>>>>>>>> >>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize. So >>>>>>>>>>>>> please >>>>>>>>>>>>> split the change to add_to_predicate_list out and compute >>>>>>>>>>>>> post-dominators >>>>>>>>>>>>> unconditionally. Note that you should call free_dominance_info >>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>>>>>>>>>> >>>>>>>>>>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>>>>>>>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>>>>>>>>>> + >>>>>>>>>>>>> + /* If edge E is critical save predicate on it. */ >>>>>>>>>>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>>>>>>>>>> + set_edge_predicate (e, cond); >>>>>>>>>>>>> >>>>>>>>>>>>> how do we know the edge is critical by this simple check? Why not >>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit >>>>>>>>>>>>> the case where e->src dominates e->dest). >>>>>>>>>>>>> >>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>>>>>>>>>> pass but need to clear it at the end (best use >>>>>>>>>>>>> clear_aux_for_edges () >>>>>>>>>>>>> for that). So stuff like >>>>>>>>>>>>> >>>>>>>>>>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>>>>>>>>>> &false_edge); >>>>>>>>>>>>> + if (flag_force_vectorize) >>>>>>>>>>>>> + true_edge->aux = false_edge->aux = NULL; >>>>>>>>>>>>> >>>>>>>>>>>>> shouldn't be necessary. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the edge predicate handling should also be unconditionally >>>>>>>>>>>>> and not depend on flag_force_vectorize. >>>>>>>>>>>>> >>>>>>>>>>>>> + /* The loop latch and loop exit block are always executed >>>>>>>>>>>>> and >>>>>>>>>>>>> + have no extra conditions to be processed: skip them. */ >>>>>>>>>>>>> + if (bb == loop->latch >>>>>>>>>>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think the edge stuff is true - given you still only reset >>>>>>>>>>>>> the >>>>>>>>>>>>> loop->latch bb predicate the change looks broken. >>>>>>>>>>>>> >>>>>>>>>>>>> + /* Fold_build2 can produce bool conversion which is not >>>>>>>>>>>>> + supported by vectorizer, so re-build it without >>>>>>>>>>>>> folding. >>>>>>>>>>>>> + For example, such conversion is generated for >>>>>>>>>>>>> sequence: >>>>>>>>>>>>> + _Bool _7, _8, _9; >>>>>>>>>>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>>>>>>>>>> + if (_9 != 0) --> (bool)_9. */ >>>>>>>>>>>>> + >>>>>>>>>>>>> + if (CONVERT_EXPR_P (c) >>>>>>>>>>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>>>>>>>>>> >>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>>>>>>>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all. >>>>>>>>>>>>> >>>>>>>>>>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>>> - unshare_expr (cond), c2); >>>>>>>>>>>>> + add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>>> unshare_expr (cond), >>>>>>>>>>>>> + unshare_expr (c2)); >>>>>>>>>>>>> >>>>>>>>>>>>> why is it necessary to unshare c2? >>>>>>>>>>>>> >>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling (I have not >>>>>>>>>>>>> looked at >>>>>>>>>>>>> that in detail). >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Changelog. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>>>>>> clone. >>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>>> destination block of edge is not always executed. Set-up >>>>>>>>>>>>>> predicate >>>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only >>>>>>>>>>>>>> if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>> current >>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>> versioning >>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field >>>>>>>>>>>>>> of edges >>>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>>>>>>>>>> Let's me also answer on your comments. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical >>>>>>>>>>>>>>> edges. >>>>>>>>>>>>>>> My previous code was not correct and now it looks like: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) >>>>>>>>>>>>>>> == 1) >>>>>>>>>>>>>>> /* Edge E is not critical, use predicate of edge source >>>>>>>>>>>>>>> bb. */ >>>>>>>>>>>>>>> c = bb_predicate (b); >>>>>>>>>>>>>>> else >>>>>>>>>>>>>>> /* Edge E is critical and its aux field contains predicate. >>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>> c = edge_predicate (e); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. I completely delete all code related to creation of >>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate >>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will add >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>> local-dce function in next patch. >>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general >>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of >>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>> scalar reduction can be applied also. >>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with >>>>>>>>>>>>>>> pragma >>>>>>>>>>>>>>> omp simd only. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up >>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>> original >>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field >>>>>>>>>>>>>>> of edges >>>>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener >>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>> Richard! >>>>>>>>>>>>>>>>> Here is updated patch with the following changes: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for >>>>>>>>>>>>>>>>> extended conversion. >>>>>>>>>>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, >>>>>>>>>>>>>>>>> i.e. >>>>>>>>>>>>>>>>> negate_predicate was deleted. >>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing >>>>>>>>>>>>>>>>> edges can >>>>>>>>>>>>>>>>> be critical. >>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join >>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>> blocks to simplify it. >>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead >>>>>>>>>>>>>>>>> generating >>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas >>>>>>>>>>>>>>>>> for phi >>>>>>>>>>>>>>>>> of kind >>>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>>> only one cond expression is required and this is considered >>>>>>>>>>>>>>>>> as simple >>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of >>>>>>>>>>>>>>>>> them has >>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have >>>>>>>>>>>>>>>>> only 2 >>>>>>>>>>>>>>>>> arguments. >>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is >>>>>>>>>>>>>>>>> produced. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Updated patch is attached. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Any comments will be appreciated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once >>>>>>>>>>>>>>>> which makes >>>>>>>>>>>>>>>> it hard to review. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In addition to that it changes function singatures without >>>>>>>>>>>>>>>> updating >>>>>>>>>>>>>>>> the function comments. For example what is the convert_bool >>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>>>>>>>>>> all this added logic. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an >>>>>>>>>>>>>>>> obvious >>>>>>>>>>>>>>>> candidate for splitting out into a separate patch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>>> + which is not supported by vectorizer to int type through >>>>>>>>>>>>>>>> creating of >>>>>>>>>>>>>>>> + conditional expressions. */ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Example? The vectorizer has patterns for bool predicate >>>>>>>>>>>>>>>> computations. >>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward >>>>>>>>>>>>>>>> to me. >>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>>>>>>>>>> split critical edges (of the respective loop body). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by >>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>> forwarder blocks would be nicer to have. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply >>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing >>>>>>>>>>>>>>>> that). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So - please split up the patch. It's way too big. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further >>>>>>>>>>>>>>>>> possible use. >>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is >>>>>>>>>>>>>>>>> true. >>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is >>>>>>>>>>>>>>>>> true. >>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on >>>>>>>>>>>>>>>>> flag_force_vectorize. >>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two >>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of >>>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to >>>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following >>>>>>>>>>>>>>>>> transformation: >>>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if >>>>>>>>>>>>>>>>> convert_bool >>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional >>>>>>>>>>>>>>>>> argument >>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>>>>>>>>>> add_to_predicate_list. >>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent >>>>>>>>>>>>>>>>> bb's. >>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>>> phi arguments must be evaluated through >>>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp >>>>>>>>>>>>>>>>> if cond >>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for >>>>>>>>>>>>>>>>> extended >>>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in >>>>>>>>>>>>>>>>>>> part of >>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These >>>>>>>>>>>>>>>>>>> extensions >>>>>>>>>>>>>>>>>>> include: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or >>>>>>>>>>>>>>>>>>> its outer >>>>>>>>>>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); >>>>>>>>>>>>>>>>>>> For ordinary >>>>>>>>>>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have >>>>>>>>>>>>>>>>>>> more than 2 >>>>>>>>>>>>>>>>>>> predecessors. >>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed >>>>>>>>>>>>>>>>>>> in current design: >>>>>>>>>>>>>>>>>>> all phi nodes must be in non-predicated basic block to >>>>>>>>>>>>>>>>>>> conform >>>>>>>>>>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> How is that so? If the PHI is predicated then its result >>>>>>>>>>>>>>>>>> will be used >>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of >>>>>>>>>>>>>>>>>> COND_EXPRs. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> No? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than >>>>>>>>>>>>>>>>>>> 2 arguments >>>>>>>>>>>>>>>>>>> with some limitations: >>>>>>>>>>>>>>>>>>> - for phi nodes which have more than 2 arguments, but >>>>>>>>>>>>>>>>>>> only two >>>>>>>>>>>>>>>>>>> arguments are different and one of them has the only >>>>>>>>>>>>>>>>>>> occurence, >>>>>>>>>>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of >>>>>>>>>>>>>>>>>>> COND_EXPR >>>>>>>>>>>>>>>>>>> will be generated for it. In current design very simple >>>>>>>>>>>>>>>>>>> check is used: >>>>>>>>>>>>>>>>>>> check starting from end that two edges correspondent to >>>>>>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for further >>>>>>>>>>>>>>>>>>> check >>>>>>>>>>>>>>>>>>> with next edge. >>>>>>>>>>>>>>>>>>> These guarantee that phi predication will produce the >>>>>>>>>>>>>>>>>>> correct result. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI >>>>>>>>>>>>>>>>>> node by >>>>>>>>>>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> bb 5: >>>>>>>>>>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> which means that 3) has to work. Note that we want this >>>>>>>>>>>>>>>>>> kind of >>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of >>>>>>>>>>>>>>>>>> copies we need to insert on edges. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a >>>>>>>>>>>>>>>>>> pre-pass >>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG >>>>>>>>>>>>>>>>>> transform. >>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work >>>>>>>>>>>>>>>>>> around the >>>>>>>>>>>>>>>>>> critical edge limitation? Please instead change >>>>>>>>>>>>>>>>>> if-conversion to >>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>>>>>>>>>> -march=core-avx2): >>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>> <bb 4>: >>>>>>>>>>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>>>>>>>>>> goto <bb 4>; >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new >>>>>>>>>>>>>>>>>>> failures. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> gcc/ChageLog >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument >>>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument >>>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi >>>>>>>>>>>>>>>>>>> nodes are >>>>>>>>>>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two >>>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of >>>>>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to >>>>>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or >>>>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>>>>>>>>>> assignments >>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on >>>>>>>>>>>>>>>>>>> applicable >>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is >>>>>>>>>>>>>>>>>>> critical >>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false >>>>>>>>>>>>>>>>>>> predicates using >>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored >>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct >>>>>>>>>>>>>>>>>>> bb_predicate_s and >>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of >>>>>>>>>>>>>>>>>>> critical edge, >>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their >>>>>>>>>>>>>>>>>>> destination >>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with >>>>>>>>>>>>>>>>>>> additional argument >>>>>>>>>>>>>>>>>>> equal to false. >>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of >>>>>>>>>>>>>>>>>>> convert_name_to_cmp if cond >>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for >>>>>>>>>>>>>>>>>>> extended >>>>>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to >>>>>>>>>>>>>>>>>>> predicate_bbs. >>>>>>>>>>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop >>>>>>>>>>>>>>>>>>> versioning for >>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.