On Tue, Oct 21, 2014 at 4:34 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > In my initial design I did such splitting but before start real > if-conversion but I decided to not perform it since code size for > if-converted loop is growing (number of phi nodes is increased). It is > worth noting also that for phi with #nodes > 2 we need to get all > predicates (except one) to do phi-predication and it means that block > containing such phi can have only 1 critical edge.
Can you point me to the patch with the special insertion code then? I definitely want to avoid the mess we ran into with the reassoc code "clever" insertion code. Richard. > Thanks. > Yuri. > > 2014-10-21 18:19 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Oct 21, 2014 at 4:09 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Tue, Oct 21, 2014 at 3:58 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> I saw the sources of these functions, but I can't understand why I >>>> should use something else? Note that all predicate computations are >>>> located in basic blocks ( by design of if-conv) and there is special >>>> function that put these computations in bb >>>> (insert_gimplified_predicates). Edge contains only predicate not its >>>> computations. New function - find_insertion_point() does very simple >>>> search - it finds out the latest (in current bb) operand def-stmt of >>>> predicates taken from all incoming edges. >>>> In original algorithm the predicate of non-critical edge is taken to >>>> perform phi-node predication since for critical edge it does not work >>>> properly. >>>> >>>> My question is: does your comments mean that I should re-design my >>>> extensions? >>> >>> Well, we have infrastructure for inserting code on edges and you've >>> made critical edges predicated correctly. So why re-invent the wheel? >>> I realize this is very similar to my initial suggestion to simply split >>> critical edges in loops you want to if-convert but delays splitting >>> until it turns out to be necessary (which might be good for the >>> !force_vect case). >>> >>> For edge predicates you simply can emit their computation on the >>> edge, no? >>> >>> Btw, I very originally suggested to rework if-conversion to only >>> record edge predicates - having both block and edge predicates >>> somewhat complicates the code and makes it harder to >>> maintain (thus also the suggestion to simply split critical edges >>> if necessary to make BB predicates work always). >>> >>> Your patches add a lot of code and to me it seems we can avoid >>> doing so much special casing. >> >> For example attacking the critical edge issue by a simple >> >> Index: tree-if-conv.c >> =================================================================== >> --- tree-if-conv.c (revision 216508) >> +++ tree-if-conv.c (working copy) >> @@ -980,11 +980,7 @@ if_convertible_bb_p (struct loop *loop, >> if (EDGE_COUNT (e->src->succs) == 1) >> found = true; >> if (!found) >> - { >> - if (dump_file && (dump_flags & TDF_DETAILS)) >> - fprintf (dump_file, "only critical predecessors\n"); >> - return false; >> - } >> + split_edge (EDGE_PRED (bb, 0)); >> } >> >> return true; >> >> it changes the number of blocks in the loop, so >> get_loop_body_in_if_conv_order should probably be re-done with the >> above eventually signalling that it created a new block. Or the above >> should populate a vector of edges to split and do that after the >> loop calling if_convertible_bb_p. >> >> Richard. >> >>> Richard. >>> >>>> Thanks. >>>> Yuri. >>>> >>>> BTW Jeff did initial review of my changes related to predicate >>>> computation for join blocks. I presented him updated patch with >>>> test-case and some minor changes in patch. But still did not get any >>>> feedback on it. Could you please take a look also on it? >>>> >>>> >>>> 2014-10-21 17:38 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Tue, Oct 21, 2014 at 3:20 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> Yes, This patch does not make sense since phi node predication for bb >>>>>> with critical incoming edges only performs another function which is >>>>>> absent (predicate_extended_scalar_phi). >>>>>> >>>>>> BTW I see that commit_edge_insertions() is used for rtx instructions >>>>>> only but you propose to use it for tree also. >>>>>> Did I miss something? >>>>> >>>>> Ah, it's gsi_commit_edge_inserts () (or gsi_commit_one_edge_insert >>>>> if you want easy access to the newly created basic block to push >>>>> the predicate to - see gsi_commit_edge_inserts implementation). >>>>> >>>>> Richard. >>>>> >>>>>> Thanks ahead. >>>>>> >>>>>> >>>>>> 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> I did some changes in patch and ChangeLog to mark that support for >>>>>>>> if-convert of blocks with only critical incoming edges will be added >>>>>>>> in the future (more precise in patch.4). >>>>>>> >>>>>>> But the same reasoning applies to this version of the patch when >>>>>>> flag_force_vectorize is true!? (insertion point and invalid SSA form) >>>>>>> >>>>>>> Which means the patch doesn't make sense in isolation? >>>>>>> >>>>>>> Btw, I think for the case you should simply do gsi_insert_on_edge () >>>>>>> and commit_edge_insertions () before the call to combine_blocks >>>>>>> (pushing the edge predicate to the newly created block). >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Could you please review it. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> ChangeLog: >>>>>>>> >>>>>>>> 2014-10-21 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> (flag_force_vectorize): New variable. >>>>>>>> (edge_predicate): New function. >>>>>>>> (set_edge_predicate): New function. >>>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>>>> for critical edge. >>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>> (all_preds_critical_p): New function. >>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>>>> to reject temporarily block if-conversion with incoming critical edges >>>>>>>> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be >>>>>>>> deleted >>>>>>>> after adding support for extended predication. >>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>> is equal 2 and at least one incoming edge is not critical original >>>>>>>> algorithm is used. >>>>>>>> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. >>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>> >>>>>>>> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Thanks for your answer! >>>>>>>>> >>>>>>>>> In current implementation phi node conversion assume that one of >>>>>>>>> incoming edge to bb containing given phi has at least one non-critical >>>>>>>>> edge and choose it to insert predicated code. But if we choose >>>>>>>>> critical edge we need to determine insert point and insertion >>>>>>>>> direction (before/after) since in other case we can get invalid ssa >>>>>>>>> form (use before def). This is done by my new function which is not in >>>>>>>>> current patch ( I will present this patch later). SO I assume that we >>>>>>>>> need to leave this patch as it is to not introduce new bugs. >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> 2014-10-20 12:00 GMT+04:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> I reworked the patch as you proposed, but I didn't understand what >>>>>>>>>>> did you mean by: >>>>>>>>>>> >>>>>>>>>>>>So please rework the patch so critical edges are always handled >>>>>>>>>>>>correctly. >>>>>>>>>>> >>>>>>>>>>> In current patch flag_force_vectorize is used (1) to reject phi >>>>>>>>>>> nodes >>>>>>>>>>> with more than 2 arguments; (2) to reject basic blocks with only >>>>>>>>>>> critical incoming edges since support for extended predication of >>>>>>>>>>> phi >>>>>>>>>>> nodes will be in next patch. >>>>>>>>>> >>>>>>>>>> I mean that (2) should not be rejected dependent on >>>>>>>>>> flag_force_vectorize. >>>>>>>>>> It was rejected because if-cvt couldn't handle it correctly before >>>>>>>>>> but with >>>>>>>>>> this patch this is fixed. I see no reason to still reject this then >>>>>>>>>> even >>>>>>>>>> for !flag_force_vectorize. >>>>>>>>>> >>>>>>>>>> Rejecting PHIs with more than two arguments with flag_force_vectorize >>>>>>>>>> is ok. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> Could you please clarify your statement. >>>>>>>>>>> >>>>>>>>>>> I attached modified patch. >>>>>>>>>>> >>>>>>>>>>> ChangeLog: >>>>>>>>>>> >>>>>>>>>>> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>>>> add_to_predicate_list >>>>>>>>>>> if destination block of edge is not always executed. Set-up >>>>>>>>>>> predicate >>>>>>>>>>> for critical edge. >>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>> phi-block >>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>> algorithm is used. >>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to >>>>>>>>>>> false. >>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Richard, >>>>>>>>>>>>> >>>>>>>>>>>>> Here is reduced patch as you requested. All your remarks have >>>>>>>>>>>>> been fixed. >>>>>>>>>>>>> Could you please look at it ( I have already sent the patch with >>>>>>>>>>>>> changes in add_to_predicate_list for review). >>>>>>>>>>>> >>>>>>>>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>>>>>>>> + fprintf (dump_file, "More than two phi node >>>>>>>>>>>> args.\n"); >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> Excess vertical space. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> +/* Assumes that BB has more than 2 predecessors. >>>>>>>>>>>> >>>>>>>>>>>> More than 1 predecessor? >>>>>>>>>>>> >>>>>>>>>>>> + Returns false if at least one successor is not on critical edge >>>>>>>>>>>> + and true otherwise. */ >>>>>>>>>>>> + >>>>>>>>>>>> +static inline bool >>>>>>>>>>>> +all_edges_are_critical (basic_block bb) >>>>>>>>>>>> +{ >>>>>>>>>>>> >>>>>>>>>>>> "all_preds_critical_p" would be a better name >>>>>>>>>>>> >>>>>>>>>>>> + if (EDGE_COUNT (bb->preds) > 2) >>>>>>>>>>>> + { >>>>>>>>>>>> + if (!flag_force_vectorize) >>>>>>>>>>>> + return false; >>>>>>>>>>>> + } >>>>>>>>>>>> >>>>>>>>>>>> as I said in the last review I don't think we should restrict edge >>>>>>>>>>>> predicates to flag_force_vectorize. At least I can't see how >>>>>>>>>>>> if-conversion is magically more expensive for that case? >>>>>>>>>>>> >>>>>>>>>>>> So please rework the patch so critical edges are always handled >>>>>>>>>>>> correctly. >>>>>>>>>>>> >>>>>>>>>>>> Ok with that and the above suggested changes. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Yuri. >>>>>>>>>>>>> ChangeLog >>>>>>>>>>>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>> (add_to_dst_predicate_list): Conditionally invoke >>>>>>>>>>>>> add_to_predicate_list >>>>>>>>>>>>> if destination block of edge is not always executed. Set-up >>>>>>>>>>>>> predicate >>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>>>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>>>>>>>> Add zeroing of edge 'aux' field. >>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>> phi-block >>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to >>>>>>>>>>>>> false. >>>>>>>>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here is updated patch (part1) for extended if conversion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Second part of patch will be sent later. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ok, I'm starting to look at this. I'd still like you to split >>>>>>>>>>>>>> things up >>>>>>>>>>>>>> more. >>>>>>>>>>>>>> >>>>>>>>>>>>>> static inline void >>>>>>>>>>>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree >>>>>>>>>>>>>> nc) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* We use notion of cd equivalence to get simplier >>>>>>>>>>>>>> predicate for >>>>>>>>>>>>>> + join block, e.g. if join block has 2 predecessors with >>>>>>>>>>>>>> predicates >>>>>>>>>>>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it >>>>>>>>>>>>>> instead of >>>>>>>>>>>>>> + p1 & p2 | p1 & !p2. */ >>>>>>>>>>>>>> + if (dom_bb != loop->header >>>>>>>>>>>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, >>>>>>>>>>>>>> dom_bb) == bb) >>>>>>>>>>>>>> + { >>>>>>>>>>>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>>>>>>>>>>> + bc = bb_predicate (dom_bb); >>>>>>>>>>>>>> + gcc_assert (!is_true_predicate (bc)); >>>>>>>>>>>>>> >>>>>>>>>>>>>> these changes look worthwhile even for !flag_force_vectorize. >>>>>>>>>>>>>> So please >>>>>>>>>>>>>> split the change to add_to_predicate_list out and compute >>>>>>>>>>>>>> post-dominators >>>>>>>>>>>>>> unconditionally. Note that you should call free_dominance_info >>>>>>>>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>>>>>>>>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + /* If edge E is critical save predicate on it. */ >>>>>>>>>>>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>>>>>>>>>>> + set_edge_predicate (e, cond); >>>>>>>>>>>>>> >>>>>>>>>>>>>> how do we know the edge is critical by this simple check? Why >>>>>>>>>>>>>> not >>>>>>>>>>>>>> simply always save edge predicates (well, you kind of do but omit >>>>>>>>>>>>>> the case where e->src dominates e->dest). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>>>>>>>>>>> pass but need to clear it at the end (best use >>>>>>>>>>>>>> clear_aux_for_edges () >>>>>>>>>>>>>> for that). So stuff like >>>>>>>>>>>>>> >>>>>>>>>>>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>>>>>>>>>>> &false_edge); >>>>>>>>>>>>>> + if (flag_force_vectorize) >>>>>>>>>>>>>> + true_edge->aux = false_edge->aux = NULL; >>>>>>>>>>>>>> >>>>>>>>>>>>>> shouldn't be necessary. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think the edge predicate handling should also be >>>>>>>>>>>>>> unconditionally >>>>>>>>>>>>>> and not depend on flag_force_vectorize. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* The loop latch and loop exit block are always executed >>>>>>>>>>>>>> and >>>>>>>>>>>>>> + have no extra conditions to be processed: skip them. */ >>>>>>>>>>>>>> + if (bb == loop->latch >>>>>>>>>>>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't think the edge stuff is true - given you still only >>>>>>>>>>>>>> reset the >>>>>>>>>>>>>> loop->latch bb predicate the change looks broken. >>>>>>>>>>>>>> >>>>>>>>>>>>>> + /* Fold_build2 can produce bool conversion which is not >>>>>>>>>>>>>> + supported by vectorizer, so re-build it without >>>>>>>>>>>>>> folding. >>>>>>>>>>>>>> + For example, such conversion is generated for >>>>>>>>>>>>>> sequence: >>>>>>>>>>>>>> + _Bool _7, _8, _9; >>>>>>>>>>>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>>>>>>>>>>> + if (_9 != 0) --> (bool)_9. */ >>>>>>>>>>>>>> + >>>>>>>>>>>>>> + if (CONVERT_EXPR_P (c) >>>>>>>>>>>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>>>>>>>>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>>>>>>>>>>> operands from the GIMPLE condition unmodified after all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>>>> - unshare_expr (cond), c2); >>>>>>>>>>>>>> + add_to_dst_predicate_list (loop, false_edge, >>>>>>>>>>>>>> unshare_expr (cond), >>>>>>>>>>>>>> + unshare_expr (c2)); >>>>>>>>>>>>>> >>>>>>>>>>>>>> why is it necessary to unshare c2? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please split out the PHI-with-multi-arg handling (I have not >>>>>>>>>>>>>> looked at >>>>>>>>>>>>>> that in detail). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Changelog. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up >>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges only >>>>>>>>>>>>>>> if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>> original >>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>> current >>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field >>>>>>>>>>>>>>> of edges >>>>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>>>>>>>>> Richard, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>>>>>>>>>>> Let's me also answer on your comments. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for >>>>>>>>>>>>>>>> critical edges. >>>>>>>>>>>>>>>> My previous code was not correct and now it looks like: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT >>>>>>>>>>>>>>>> (e->dest->preds) == 1) >>>>>>>>>>>>>>>> /* Edge E is not critical, use predicate of edge source >>>>>>>>>>>>>>>> bb. */ >>>>>>>>>>>>>>>> c = bb_predicate (b); >>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>> /* Edge E is critical and its aux field contains >>>>>>>>>>>>>>>> predicate. */ >>>>>>>>>>>>>>>> c = edge_predicate (e); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2. I completely delete all code related to creation of >>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>>>>>>>>>>> vectorizer. But we need to delete all dead predicate >>>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>>> which are not used since they prevent vectorization. I will >>>>>>>>>>>>>>>> add this >>>>>>>>>>>>>>>> local-dce function in next patch. >>>>>>>>>>>>>>>> 3. I also did not include in this patch recognition of general >>>>>>>>>>>>>>>> phi-nodes with two arguments only for which conversion of >>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>> scalar reduction can be applied also. >>>>>>>>>>>>>>>> Note that all these changes are applied for loop marked with >>>>>>>>>>>>>>>> pragma >>>>>>>>>>>>>>>> omp simd only. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is >>>>>>>>>>>>>>>> always >>>>>>>>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>>>>>>>> for join blocks if it exists. >>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>>>>>>>> destination block of edge is not always executed. Set-up >>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>> for critical edge. >>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors >>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of >>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>> to reject block if-conversion with incoming critical edges >>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent >>>>>>>>>>>>>>>> bb's. >>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>>>>>>>> that extended predication must be applied). >>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field >>>>>>>>>>>>>>>> of edges >>>>>>>>>>>>>>>> for blocks with two successors. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener >>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> Richard! >>>>>>>>>>>>>>>>>> Here is updated patch with the following changes: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for >>>>>>>>>>>>>>>>>> extended conversion. >>>>>>>>>>>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, >>>>>>>>>>>>>>>>>> i.e. >>>>>>>>>>>>>>>>>> negate_predicate was deleted. >>>>>>>>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing >>>>>>>>>>>>>>>>>> edges can >>>>>>>>>>>>>>>>>> be critical. >>>>>>>>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join >>>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>>> blocks to simplify it. >>>>>>>>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead >>>>>>>>>>>>>>>>>> generating >>>>>>>>>>>>>>>>>> chain of cond expressions for phi-node if conversion, >>>>>>>>>>>>>>>>>> whereas for phi >>>>>>>>>>>>>>>>>> of kind >>>>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>>>> only one cond expression is required and this is considered >>>>>>>>>>>>>>>>>> as simple >>>>>>>>>>>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>>>>>>>>>>> if phi-function have only two different arguments and one of >>>>>>>>>>>>>>>>>> them has >>>>>>>>>>>>>>>>>> single occurrence, if- conversion is performed as if phi >>>>>>>>>>>>>>>>>> have only 2 >>>>>>>>>>>>>>>>>> arguments. >>>>>>>>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is >>>>>>>>>>>>>>>>>> produced. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Updated patch is attached. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any comments will be appreciated. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The patch is still very big and does multiple things at once >>>>>>>>>>>>>>>>> which makes >>>>>>>>>>>>>>>>> it hard to review. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In addition to that it changes function singatures without >>>>>>>>>>>>>>>>> updating >>>>>>>>>>>>>>>>> the function comments. For example what is the convert_bool >>>>>>>>>>>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>>>>>>>>>>> all this added logic. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>>>>>>>>>>> only two unequal operands is useful generally, so that's an >>>>>>>>>>>>>>>>> obvious >>>>>>>>>>>>>>>>> candidate for splitting out into a separate patch. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>>>>>>>>>>> computations >>>>>>>>>>>>>>>>> + which is not supported by vectorizer to int type through >>>>>>>>>>>>>>>>> creating of >>>>>>>>>>>>>>>>> + conditional expressions. */ >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Example? The vectorizer has patterns for bool predicate >>>>>>>>>>>>>>>>> computations. >>>>>>>>>>>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The way you get around the critical edge parts looks awkward >>>>>>>>>>>>>>>>> to me. >>>>>>>>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>>>>>>>>>>> split critical edges (of the respective loop body). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I still think that an utility doing same PHI arg merging by >>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>> forwarder blocks would be nicer to have. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply >>>>>>>>>>>>>>>>> these >>>>>>>>>>>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>>>>>>>>>>> for if conversion (eventually transitioning to always doing >>>>>>>>>>>>>>>>> that). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So - please split up the patch. It's way too big. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect >>>>>>>>>>>>>>>>>> function clone. >>>>>>>>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>>>>>> field. >>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> such bb exists; save it in static variable for further >>>>>>>>>>>>>>>>>> possible use. >>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument >>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument >>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two >>>>>>>>>>>>>>>>>> args >>>>>>>>>>>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on >>>>>>>>>>>>>>>>>> flag_force_vectorize. >>>>>>>>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two >>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up. Use call of >>>>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to >>>>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>>>> with integral operands. If convert_bool argument was set-up >>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> vect bool pattern can be appied perform the following >>>>>>>>>>>>>>>>>> transformation: >>>>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if >>>>>>>>>>>>>>>>>> convert_bool >>>>>>>>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional >>>>>>>>>>>>>>>>>> argument >>>>>>>>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>>>>>>>>>>> add_to_predicate_list. >>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent >>>>>>>>>>>>>>>>>> bb's. >>>>>>>>>>>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>>>> phi arguments must be evaluated through >>>>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of >>>>>>>>>>>>>>>>>> convert_name_to_cmp if cond >>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated >>>>>>>>>>>>>>>>>> basic >>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for >>>>>>>>>>>>>>>>>> extended >>>>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>>>>>>>> versioning >>>>>>>>>>>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> We implemented additional support for pragma omp simd in >>>>>>>>>>>>>>>>>>>> part of >>>>>>>>>>>>>>>>>>>> extended if-conversion loops with such pragma. These >>>>>>>>>>>>>>>>>>>> extensions >>>>>>>>>>>>>>>>>>>> include: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or >>>>>>>>>>>>>>>>>>>> its outer >>>>>>>>>>>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); >>>>>>>>>>>>>>>>>>>> For ordinary >>>>>>>>>>>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have >>>>>>>>>>>>>>>>>>>> more than 2 >>>>>>>>>>>>>>>>>>>> predecessors. >>>>>>>>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was >>>>>>>>>>>>>>>>>>>> missed in current design: >>>>>>>>>>>>>>>>>>>> all phi nodes must be in non-predicated basic block to >>>>>>>>>>>>>>>>>>>> conform >>>>>>>>>>>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How is that so? If the PHI is predicated then its result >>>>>>>>>>>>>>>>>>> will be used >>>>>>>>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of >>>>>>>>>>>>>>>>>>> COND_EXPRs. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> No? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than >>>>>>>>>>>>>>>>>>>> 2 arguments >>>>>>>>>>>>>>>>>>>> with some limitations: >>>>>>>>>>>>>>>>>>>> - for phi nodes which have more than 2 arguments, but >>>>>>>>>>>>>>>>>>>> only two >>>>>>>>>>>>>>>>>>>> arguments are different and one of them has the only >>>>>>>>>>>>>>>>>>>> occurence, >>>>>>>>>>>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of >>>>>>>>>>>>>>>>>>>> COND_EXPR >>>>>>>>>>>>>>>>>>>> will be generated for it. In current design very simple >>>>>>>>>>>>>>>>>>>> check is used: >>>>>>>>>>>>>>>>>>>> check starting from end that two edges correspondent to >>>>>>>>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>>>>>>>> arguments have common predecessor which is used for >>>>>>>>>>>>>>>>>>>> further check >>>>>>>>>>>>>>>>>>>> with next edge. >>>>>>>>>>>>>>>>>>>> These guarantee that phi predication will produce the >>>>>>>>>>>>>>>>>>>> correct result. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI >>>>>>>>>>>>>>>>>>> node by >>>>>>>>>>>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> becomes >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> bb 5: >>>>>>>>>>>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> which means that 3) has to work. Note that we want this >>>>>>>>>>>>>>>>>>> kind of >>>>>>>>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number >>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> copies we need to insert on edges. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a >>>>>>>>>>>>>>>>>>> pre-pass >>>>>>>>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG >>>>>>>>>>>>>>>>>>> transform. >>>>>>>>>>>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work >>>>>>>>>>>>>>>>>>> around the >>>>>>>>>>>>>>>>>>> critical edge limitation? Please instead change >>>>>>>>>>>>>>>>>>> if-conversion to >>>>>>>>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>>>>>>>>>>> -march=core-avx2): >>>>>>>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> <bb 4>: >>>>>>>>>>>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>>>>>>>>>>> goto <bb 4>; >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Bootstrap and regression testing did not show any new >>>>>>>>>>>>>>>>>>>> failures. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> gcc/ChageLog >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>>>>>>>> (init_bb_predicate): Add initialization of >>>>>>>>>>>>>>>>>>>> negate_predicate field. >>>>>>>>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument >>>>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>>>>>>>> Add early function exit if edge target block is always >>>>>>>>>>>>>>>>>>>> executed. >>>>>>>>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument >>>>>>>>>>>>>>>>>>>> is true. >>>>>>>>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than >>>>>>>>>>>>>>>>>>>> two args >>>>>>>>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi >>>>>>>>>>>>>>>>>>>> nodes are >>>>>>>>>>>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two >>>>>>>>>>>>>>>>>>>> predecessors if >>>>>>>>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of >>>>>>>>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges >>>>>>>>>>>>>>>>>>>> only if >>>>>>>>>>>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to >>>>>>>>>>>>>>>>>>>> transform >>>>>>>>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>>>>>>>> expressions >>>>>>>>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or >>>>>>>>>>>>>>>>>>>> both >>>>>>>>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>>>>>>>>>>> assignments >>>>>>>>>>>>>>>>>>>> is used, otherwise the following code was added: check on >>>>>>>>>>>>>>>>>>>> applicable >>>>>>>>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is >>>>>>>>>>>>>>>>>>>> critical >>>>>>>>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false >>>>>>>>>>>>>>>>>>>> predicates using >>>>>>>>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> predicate and negate_predicate fields of struct >>>>>>>>>>>>>>>>>>>> bb_predicate_s and >>>>>>>>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of >>>>>>>>>>>>>>>>>>>> critical edge, >>>>>>>>>>>>>>>>>>>> but generated gimplified statements are stored in their >>>>>>>>>>>>>>>>>>>> destination >>>>>>>>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed >>>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with >>>>>>>>>>>>>>>>>>>> additional argument >>>>>>>>>>>>>>>>>>>> equal to false. >>>>>>>>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function >>>>>>>>>>>>>>>>>>>> interface: >>>>>>>>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> extended phi node predication. If number of predecessors >>>>>>>>>>>>>>>>>>>> of phi-block >>>>>>>>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical >>>>>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>>>>>>>> signals that >>>>>>>>>>>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of >>>>>>>>>>>>>>>>>>>> convert_name_to_cmp if cond >>>>>>>>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple >>>>>>>>>>>>>>>>>>>> statement >>>>>>>>>>>>>>>>>>>> iterator for predication of extended scalar phi's for >>>>>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for >>>>>>>>>>>>>>>>>>>> non-predicated basic >>>>>>>>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. >>>>>>>>>>>>>>>>>>>> Insert >>>>>>>>>>>>>>>>>>>> predicates at the block begining for extended >>>>>>>>>>>>>>>>>>>> if-conversion. >>>>>>>>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for >>>>>>>>>>>>>>>>>>>> extended >>>>>>>>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to >>>>>>>>>>>>>>>>>>>> predicate_bbs. >>>>>>>>>>>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>>>>>>>> current >>>>>>>>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop >>>>>>>>>>>>>>>>>>>> versioning for >>>>>>>>>>>>>>>>>>>> innermost loop marked with pragma omp simd.