Richard, Yes, This patch does not make sense since phi node predication for bb with critical incoming edges only performs another function which is absent (predicate_extended_scalar_phi).
BTW I see that commit_edge_insertions() is used for rtx instructions only but you propose to use it for tree also. Did I miss something? Thanks ahead. 2014-10-21 16:44 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Oct 21, 2014 at 2:25 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> I did some changes in patch and ChangeLog to mark that support for >> if-convert of blocks with only critical incoming edges will be added >> in the future (more precise in patch.4). > > But the same reasoning applies to this version of the patch when > flag_force_vectorize is true!? (insertion point and invalid SSA form) > > Which means the patch doesn't make sense in isolation? > > Btw, I think for the case you should simply do gsi_insert_on_edge () > and commit_edge_insertions () before the call to combine_blocks > (pushing the edge predicate to the newly created block). > > Richard. > >> Could you please review it. >> >> Thanks. >> >> ChangeLog: >> >> 2014-10-21 Yuri Rumyantsev <ysrum...@gmail.com> >> >> (flag_force_vectorize): New variable. >> (edge_predicate): New function. >> (set_edge_predicate): New function. >> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >> if destination block of edge is not always executed. Set-up predicate >> for critical edge. >> (if_convertible_phi_p): Accept phi nodes with more than two args >> if FLAG_FORCE_VECTORIZE was set-up. >> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >> (if_convertible_stmt_p): Fix up pre-function comments. >> (all_preds_critical_p): New function. >> (if_convertible_bb_p): Use call of all_preds_critical_p >> to reject temporarily block if-conversion with incoming critical edges >> if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted >> after adding support for extended predication. >> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >> to compute predicate instead of fold_build2_loc. >> Add zeroing of edge 'aux' field. >> (find_phi_replacement_condition): Extend function interface: >> it returns NULL if given phi node must be handled by means of >> extended phi node predication. If number of predecessors of phi-block >> is equal 2 and at least one incoming edge is not critical original >> algorithm is used. >> (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. >> Nullify 'aux' field of edges for blocks with two successors. >> >> 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>> Richard, >>> >>> Thanks for your answer! >>> >>> In current implementation phi node conversion assume that one of >>> incoming edge to bb containing given phi has at least one non-critical >>> edge and choose it to insert predicated code. But if we choose >>> critical edge we need to determine insert point and insertion >>> direction (before/after) since in other case we can get invalid ssa >>> form (use before def). This is done by my new function which is not in >>> current patch ( I will present this patch later). SO I assume that we >>> need to leave this patch as it is to not introduce new bugs. >>> >>> Thanks. >>> Yuri. >>> >>> 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> I reworked the patch as you proposed, but I didn't understand what >>>>> did you mean by: >>>>> >>>>>>So please rework the patch so critical edges are always handled >>>>>>correctly. >>>>> >>>>> In current patch flag_force_vectorize is used (1) to reject phi nodes >>>>> with more than 2 arguments; (2) to reject basic blocks with only >>>>> critical incoming edges since support for extended predication of phi >>>>> nodes will be in next patch. >>>> >>>> I mean that (2) should not be rejected dependent on flag_force_vectorize. >>>> It was rejected because if-cvt couldn't handle it correctly before but with >>>> this patch this is fixed. I see no reason to still reject this then even >>>> for !flag_force_vectorize. >>>> >>>> Rejecting PHIs with more than two arguments with flag_force_vectorize >>>> is ok. >>>> >>>> Richard. >>>> >>>>> Could you please clarify your statement. >>>>> >>>>> I attached modified patch. >>>>> >>>>> ChangeLog: >>>>> >>>>> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >>>>> >>>>> (flag_force_vectorize): New variable. >>>>> (edge_predicate): New function. >>>>> (set_edge_predicate): New function. >>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>> if destination block of edge is not always executed. Set-up predicate >>>>> for critical edge. >>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>> (all_edges_are_critical): New function. >>>>> (if_convertible_bb_p): Use call of all_preds_critical_p >>>>> to reject block if-conversion with incoming critical edges only if >>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>> to compute predicate instead of fold_build2_loc. >>>>> Add zeroing of edge 'aux' field. >>>>> (find_phi_replacement_condition): Extend function interface: >>>>> it returns NULL if given phi node must be handled by means of >>>>> extended phi node predication. If number of predecessors of phi-block >>>>> is equal 2 and atleast one incoming edge is not critical original >>>>> algorithm is used. >>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>> >>>>> >>>>> >>>>> >>>>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> Here is reduced patch as you requested. All your remarks have been >>>>>>> fixed. >>>>>>> Could you please look at it ( I have already sent the patch with >>>>>>> changes in add_to_predicate_list for review). >>>>>> >>>>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>>>> + fprintf (dump_file, "More than two phi node args.\n"); >>>>>> + return false; >>>>>> + } >>>>>> + >>>>>> + } >>>>>> >>>>>> Excess vertical space. >>>>>> >>>>>> >>>>>> +/* Assumes that BB has more than 2 predecessors. >>>>>> >>>>>> More than 1 predecessor? >>>>>> >>>>>> + Returns false if at least one successor is not on critical edge >>>>>> + and true otherwise. */ >>>>>> + >>>>>> +static inline bool >>>>>> +all_edges_are_critical (basic_block bb) >>>>>> +{ >>>>>> >>>>>> "all_preds_critical_p" would be a better name >>>>>> >>>>>> + if (EDGE_COUNT (bb->preds) > 2) >>>>>> + { >>>>>> + if (!flag_force_vectorize) >>>>>> + return false; >>>>>> + } >>>>>> >>>>>> as I said in the last review I don't think we should restrict edge >>>>>> predicates to flag_force_vectorize. At least I can't see how >>>>>> if-conversion is magically more expensive for that case? >>>>>> >>>>>> So please rework the patch so critical edges are always handled >>>>>> correctly. >>>>>> >>>>>> Ok with that and the above suggested changes. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> >>>>>>> Thanks. >>>>>>> Yuri. >>>>>>> ChangeLog >>>>>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> (flag_force_vectorize): New variable. >>>>>>> (edge_predicate): New function. >>>>>>> (set_edge_predicate): New function. >>>>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>>>> if destination block of edge is not always executed. Set-up predicate >>>>>>> for critical edge. >>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>> (all_edges_are_critical): New function. >>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>>>> to compute predicate instead of fold_build2_loc. >>>>>>> Add zeroing of edge 'aux' field. >>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>> algorithm is used. >>>>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Here is updated patch (part1) for extended if conversion. >>>>>>>>> >>>>>>>>> Second part of patch will be sent later. >>>>>>>> >>>>>>>> Ok, I'm starting to look at this. I'd still like you to split things >>>>>>>> up >>>>>>>> more. >>>>>>>> >>>>>>>> static inline void >>>>>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) >>>>>>>> { >>>>>>>> ... >>>>>>>> >>>>>>>> + /* We use notion of cd equivalence to get simplier predicate for >>>>>>>> + join block, e.g. if join block has 2 predecessors with >>>>>>>> predicates >>>>>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of >>>>>>>> + p1 & p2 | p1 & !p2. */ >>>>>>>> + if (dom_bb != loop->header >>>>>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == >>>>>>>> bb) >>>>>>>> + { >>>>>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>>>>> + bc = bb_predicate (dom_bb); >>>>>>>> + gcc_assert (!is_true_predicate (bc)); >>>>>>>> >>>>>>>> these changes look worthwhile even for !flag_force_vectorize. So >>>>>>>> please >>>>>>>> split the change to add_to_predicate_list out and compute >>>>>>>> post-dominators >>>>>>>> unconditionally. Note that you should call free_dominance_info >>>>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>>>>> >>>>>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>>>>> + >>>>>>>> + /* If edge E is critical save predicate on it. */ >>>>>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>>>>> + set_edge_predicate (e, cond); >>>>>>>> >>>>>>>> how do we know the edge is critical by this simple check? Why not >>>>>>>> simply always save edge predicates (well, you kind of do but omit >>>>>>>> the case where e->src dominates e->dest). >>>>>>>> >>>>>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>>>>> pass but need to clear it at the end (best use clear_aux_for_edges () >>>>>>>> for that). So stuff like >>>>>>>> >>>>>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>>>>> &false_edge); >>>>>>>> + if (flag_force_vectorize) >>>>>>>> + true_edge->aux = false_edge->aux = NULL; >>>>>>>> >>>>>>>> shouldn't be necessary. >>>>>>>> >>>>>>>> I think the edge predicate handling should also be unconditionally >>>>>>>> and not depend on flag_force_vectorize. >>>>>>>> >>>>>>>> + /* The loop latch and loop exit block are always executed and >>>>>>>> + have no extra conditions to be processed: skip them. */ >>>>>>>> + if (bb == loop->latch >>>>>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>>>>> >>>>>>>> I don't think the edge stuff is true - given you still only reset the >>>>>>>> loop->latch bb predicate the change looks broken. >>>>>>>> >>>>>>>> + /* Fold_build2 can produce bool conversion which is not >>>>>>>> + supported by vectorizer, so re-build it without folding. >>>>>>>> + For example, such conversion is generated for sequence: >>>>>>>> + _Bool _7, _8, _9; >>>>>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>>>>> + if (_9 != 0) --> (bool)_9. */ >>>>>>>> + >>>>>>>> + if (CONVERT_EXPR_P (c) >>>>>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>>>>> >>>>>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>>>>> operands from the GIMPLE condition unmodified after all. >>>>>>>> >>>>>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>>>>> - unshare_expr (cond), c2); >>>>>>>> + add_to_dst_predicate_list (loop, false_edge, unshare_expr >>>>>>>> (cond), >>>>>>>> + unshare_expr (c2)); >>>>>>>> >>>>>>>> why is it necessary to unshare c2? >>>>>>>> >>>>>>>> Please split out the PHI-with-multi-arg handling (I have not looked at >>>>>>>> that in detail). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>> >>>>>>>>> Changelog. >>>>>>>>> >>>>>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>> clone. >>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>> (edge_predicate): New function. >>>>>>>>> (set_edge_predicate): New function. >>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>> for join blocks if it exists. >>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>>>> for critical edge. >>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>> algorithm is used. >>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>> (find_insertion_point): New function. >>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>> that extended predication must be applied). >>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of >>>>>>>>> edges >>>>>>>>> for blocks with two successors. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>>>>> Let's me also answer on your comments. >>>>>>>>>> >>>>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical >>>>>>>>>> edges. >>>>>>>>>> My previous code was not correct and now it looks like: >>>>>>>>>> >>>>>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1) >>>>>>>>>> /* Edge E is not critical, use predicate of edge source bb. */ >>>>>>>>>> c = bb_predicate (b); >>>>>>>>>> else >>>>>>>>>> /* Edge E is critical and its aux field contains predicate. */ >>>>>>>>>> c = edge_predicate (e); >>>>>>>>>> >>>>>>>>>> 2. I completely delete all code related to creation of conditional >>>>>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>>>>> vectorizer. But we need to delete all dead predicate computations >>>>>>>>>> which are not used since they prevent vectorization. I will add this >>>>>>>>>> local-dce function in next patch. >>>>>>>>>> 3. I also did not include in this patch recognition of general >>>>>>>>>> phi-nodes with two arguments only for which conversion of conditional >>>>>>>>>> scalar reduction can be applied also. >>>>>>>>>> Note that all these changes are applied for loop marked with pragma >>>>>>>>>> omp simd only. >>>>>>>>>> >>>>>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>> clone. >>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>> (edge_predicate): New function. >>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>>>> for join blocks if it exists. >>>>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>>>>> for critical edge. >>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>>>> build2_loc. Add zeroing of edge 'aux' field under >>>>>>>>>> FLAG_FORCE_VECTORIZE. >>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>> algorithm is used. >>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>>>> that extended predication must be applied). >>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of >>>>>>>>>> edges >>>>>>>>>> for blocks with two successors. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener >>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Richard! >>>>>>>>>>>> Here is updated patch with the following changes: >>>>>>>>>>>> >>>>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended >>>>>>>>>>>> conversion. >>>>>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, i.e. >>>>>>>>>>>> negate_predicate was deleted. >>>>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges >>>>>>>>>>>> can >>>>>>>>>>>> be critical. >>>>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic >>>>>>>>>>>> blocks to simplify it. >>>>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating >>>>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for >>>>>>>>>>>> phi >>>>>>>>>>>> of kind >>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>> only one cond expression is required and this is considered as >>>>>>>>>>>> simple >>>>>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>>>>> if phi-function have only two different arguments and one of them >>>>>>>>>>>> has >>>>>>>>>>>> single occurrence, if- conversion is performed as if phi have only >>>>>>>>>>>> 2 >>>>>>>>>>>> arguments. >>>>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced. >>>>>>>>>>>> >>>>>>>>>>>> Updated patch is attached. >>>>>>>>>>>> >>>>>>>>>>>> Any comments will be appreciated. >>>>>>>>>>> >>>>>>>>>>> The patch is still very big and does multiple things at once which >>>>>>>>>>> makes >>>>>>>>>>> it hard to review. >>>>>>>>>>> >>>>>>>>>>> In addition to that it changes function singatures without updating >>>>>>>>>>> the function comments. For example what is the convert_bool >>>>>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>>>>> all this added logic. >>>>>>>>>>> >>>>>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>>>>> >>>>>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>>>>> only two unequal operands is useful generally, so that's an obvious >>>>>>>>>>> candidate for splitting out into a separate patch. >>>>>>>>>>> >>>>>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>>>>> computations >>>>>>>>>>> + which is not supported by vectorizer to int type through >>>>>>>>>>> creating of >>>>>>>>>>> + conditional expressions. */ >>>>>>>>>>> >>>>>>>>>>> Example? The vectorizer has patterns for bool predicate >>>>>>>>>>> computations. >>>>>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>>>>> >>>>>>>>>>> The way you get around the critical edge parts looks awkward to me. >>>>>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>>>>> split critical edges (of the respective loop body). >>>>>>>>>>> >>>>>>>>>>> I still think that an utility doing same PHI arg merging by >>>>>>>>>>> introducing >>>>>>>>>>> forwarder blocks would be nicer to have. >>>>>>>>>>> >>>>>>>>>>> I'd restructure the main tree_if_conversion function to apply these >>>>>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>>>>> for if conversion (eventually transitioning to always doing that). >>>>>>>>>>> >>>>>>>>>>> So - please split up the patch. It's way too big. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>>>> clone. >>>>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>>>> (edge_predicate): New function. >>>>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and >>>>>>>>>>>> such bb exists; save it in static variable for further possible >>>>>>>>>>>> use. >>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize. >>>>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical >>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if >>>>>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to >>>>>>>>>>>> transform >>>>>>>>>>>> comparison expressions of boolean type into conditional expressions >>>>>>>>>>>> with integral operands. If convert_bool argument was set-up and >>>>>>>>>>>> vect bool pattern can be appied perform the following >>>>>>>>>>>> transformation: >>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>> Add check that if fold_build2 produces bool conversion if >>>>>>>>>>>> convert_bool >>>>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional >>>>>>>>>>>> argument >>>>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>>>>> add_to_predicate_list. >>>>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's. >>>>>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>> phi-block >>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>> algorithm is used. >>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals >>>>>>>>>>>> that >>>>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args. >>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if >>>>>>>>>>>> cond >>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>>>> predication to build mask. >>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop >>>>>>>>>>>> versioning >>>>>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>>>>> >>>>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>> >>>>>>>>>>>>>> We implemented additional support for pragma omp simd in part of >>>>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions >>>>>>>>>>>>>> include: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. All extensions are performed only if considered loop or its >>>>>>>>>>>>>> outer >>>>>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); For >>>>>>>>>>>>>> ordinary >>>>>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more >>>>>>>>>>>>>> than 2 >>>>>>>>>>>>>> predecessors. >>>>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in >>>>>>>>>>>>>> current design: >>>>>>>>>>>>>> all phi nodes must be in non-predicated basic block to conform >>>>>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>>>>> >>>>>>>>>>>>> How is that so? If the PHI is predicated then its result will be >>>>>>>>>>>>> used >>>>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs. >>>>>>>>>>>>> >>>>>>>>>>>>> No? >>>>>>>>>>>>> >>>>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 >>>>>>>>>>>>>> arguments >>>>>>>>>>>>>> with some limitations: >>>>>>>>>>>>>> - for phi nodes which have more than 2 arguments, but only two >>>>>>>>>>>>>> arguments are different and one of them has the only >>>>>>>>>>>>>> occurence, >>>>>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>>>>> predicates >>>>>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of >>>>>>>>>>>>>> COND_EXPR >>>>>>>>>>>>>> will be generated for it. In current design very simple check >>>>>>>>>>>>>> is used: >>>>>>>>>>>>>> check starting from end that two edges correspondent to >>>>>>>>>>>>>> neighbor >>>>>>>>>>>>>> arguments have common predecessor which is used for further check >>>>>>>>>>>>>> with next edge. >>>>>>>>>>>>>> These guarantee that phi predication will produce the correct >>>>>>>>>>>>>> result. >>>>>>>>>>>>> >>>>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node >>>>>>>>>>>>> by >>>>>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>>>>> >>>>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>>>> >>>>>>>>>>>>> becomes >>>>>>>>>>>>> >>>>>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>>>>> >>>>>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>>>>> >>>>>>>>>>>>> becomes >>>>>>>>>>>>> >>>>>>>>>>>>> bb 5: >>>>>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>>>>> >>>>>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>>>>> >>>>>>>>>>>>> which means that 3) has to work. Note that we want this kind of >>>>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of >>>>>>>>>>>>> copies we need to insert on edges. >>>>>>>>>>>>> >>>>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass >>>>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform. >>>>>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>>>>> >>>>>>>>>>>>> It looks like you introduce a "negate predicate" to work around >>>>>>>>>>>>> the >>>>>>>>>>>>> critical edge limitation? Please instead change if-conversion to >>>>>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>>>>> -march=core-avx2): >>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>> { >>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>> } >>>>>>>>>>>>>> <bb 4>: >>>>>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>>>>> goto <bb 4>; >>>>>>>>>>>>>> >>>>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>>>> >>>>>>>>>>>>>> gcc/ChageLog >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate >>>>>>>>>>>>>> field. >>>>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is >>>>>>>>>>>>>> true. >>>>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is >>>>>>>>>>>>>> true. >>>>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes >>>>>>>>>>>>>> are >>>>>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>>>> flag_force_vectorize was setup. Use call of >>>>>>>>>>>>>> all_edges_are_critical >>>>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only >>>>>>>>>>>>>> if >>>>>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to >>>>>>>>>>>>>> transform >>>>>>>>>>>>>> comparison expressions of boolean type into conditional >>>>>>>>>>>>>> expressions >>>>>>>>>>>>>> with integral operands. If bool_conv argument is false or both >>>>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>>>>> assignments >>>>>>>>>>>>>> is used, otherwise the following code was added: check on >>>>>>>>>>>>>> applicable >>>>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>>>> compute predicates for both outgoing edges one of which is >>>>>>>>>>>>>> critical >>>>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates >>>>>>>>>>>>>> using >>>>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in >>>>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s >>>>>>>>>>>>>> and >>>>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical >>>>>>>>>>>>>> edge, >>>>>>>>>>>>>> but generated gimplified statements are stored in their >>>>>>>>>>>>>> destination >>>>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to >>>>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional >>>>>>>>>>>>>> argument >>>>>>>>>>>>>> equal to false. >>>>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>>>> phi-block >>>>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>>>> algorithm is used. >>>>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which >>>>>>>>>>>>>> signals that >>>>>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if >>>>>>>>>>>>>> cond >>>>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>>>>>> predication to build mask. >>>>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from >>>>>>>>>>>>>> current >>>>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for >>>>>>>>>>>>>> innermost loop marked with pragma omp simd.