Richard, Thanks for your answer!
In current implementation phi node conversion assume that one of incoming edge to bb containing given phi has at least one non-critical edge and choose it to insert predicated code. But if we choose critical edge we need to determine insert point and insertion direction (before/after) since in other case we can get invalid ssa form (use before def). This is done by my new function which is not in current patch ( I will present this patch later). SO I assume that we need to leave this patch as it is to not introduce new bugs. Thanks. Yuri. 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: > On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> I reworked the patch as you proposed, but I didn't understand what >> did you mean by: >> >>>So please rework the patch so critical edges are always handled >>>correctly. >> >> In current patch flag_force_vectorize is used (1) to reject phi nodes >> with more than 2 arguments; (2) to reject basic blocks with only >> critical incoming edges since support for extended predication of phi >> nodes will be in next patch. > > I mean that (2) should not be rejected dependent on flag_force_vectorize. > It was rejected because if-cvt couldn't handle it correctly before but with > this patch this is fixed. I see no reason to still reject this then even > for !flag_force_vectorize. > > Rejecting PHIs with more than two arguments with flag_force_vectorize > is ok. > > Richard. > >> Could you please clarify your statement. >> >> I attached modified patch. >> >> ChangeLog: >> >> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >> >> (flag_force_vectorize): New variable. >> (edge_predicate): New function. >> (set_edge_predicate): New function. >> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >> if destination block of edge is not always executed. Set-up predicate >> for critical edge. >> (if_convertible_phi_p): Accept phi nodes with more than two args >> if FLAG_FORCE_VECTORIZE was set-up. >> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >> (if_convertible_stmt_p): Fix up pre-function comments. >> (all_edges_are_critical): New function. >> (if_convertible_bb_p): Use call of all_preds_critical_p >> to reject block if-conversion with incoming critical edges only if >> FLAG_FORCE_VECTORIZE was not set-up. >> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >> to compute predicate instead of fold_build2_loc. >> Add zeroing of edge 'aux' field. >> (find_phi_replacement_condition): Extend function interface: >> it returns NULL if given phi node must be handled by means of >> extended phi node predication. If number of predecessors of phi-block >> is equal 2 and atleast one incoming edge is not critical original >> algorithm is used. >> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >> Nullify 'aux' field of edges for blocks with two successors. >> >> >> >> >> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> Here is reduced patch as you requested. All your remarks have been fixed. >>>> Could you please look at it ( I have already sent the patch with >>>> changes in add_to_predicate_list for review). >>> >>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>> + fprintf (dump_file, "More than two phi node args.\n"); >>> + return false; >>> + } >>> + >>> + } >>> >>> Excess vertical space. >>> >>> >>> +/* Assumes that BB has more than 2 predecessors. >>> >>> More than 1 predecessor? >>> >>> + Returns false if at least one successor is not on critical edge >>> + and true otherwise. */ >>> + >>> +static inline bool >>> +all_edges_are_critical (basic_block bb) >>> +{ >>> >>> "all_preds_critical_p" would be a better name >>> >>> + if (EDGE_COUNT (bb->preds) > 2) >>> + { >>> + if (!flag_force_vectorize) >>> + return false; >>> + } >>> >>> as I said in the last review I don't think we should restrict edge >>> predicates to flag_force_vectorize. At least I can't see how >>> if-conversion is magically more expensive for that case? >>> >>> So please rework the patch so critical edges are always handled >>> correctly. >>> >>> Ok with that and the above suggested changes. >>> >>> Thanks, >>> Richard. >>> >>> >>>> Thanks. >>>> Yuri. >>>> ChangeLog >>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>> >>>> (flag_force_vectorize): New variable. >>>> (edge_predicate): New function. >>>> (set_edge_predicate): New function. >>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>> if destination block of edge is not always executed. Set-up predicate >>>> for critical edge. >>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>> if FLAG_FORCE_VECTORIZE was set-up. >>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>> (all_edges_are_critical): New function. >>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>> to reject block if-conversion with incoming critical edges only if >>>> FLAG_FORCE_VECTORIZE was not set-up. >>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>> to compute predicate instead of fold_build2_loc. >>>> Add zeroing of edge 'aux' field. >>>> (find_phi_replacement_condition): Extend function interface: >>>> it returns NULL if given phi node must be handled by means of >>>> extended phi node predication. If number of predecessors of phi-block >>>> is equal 2 and atleast one incoming edge is not critical original >>>> algorithm is used. >>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>> Nullify 'aux' field of edges for blocks with two successors. >>>> >>>> >>>> >>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> Here is updated patch (part1) for extended if conversion. >>>>>> >>>>>> Second part of patch will be sent later. >>>>> >>>>> Ok, I'm starting to look at this. I'd still like you to split things up >>>>> more. >>>>> >>>>> static inline void >>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) >>>>> { >>>>> ... >>>>> >>>>> + /* We use notion of cd equivalence to get simplier predicate for >>>>> + join block, e.g. if join block has 2 predecessors with predicates >>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of >>>>> + p1 & p2 | p1 & !p2. */ >>>>> + if (dom_bb != loop->header >>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb) >>>>> + { >>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>> + bc = bb_predicate (dom_bb); >>>>> + gcc_assert (!is_true_predicate (bc)); >>>>> >>>>> these changes look worthwhile even for !flag_force_vectorize. So please >>>>> split the change to add_to_predicate_list out and compute post-dominators >>>>> unconditionally. Note that you should call free_dominance_info >>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>> >>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>> + >>>>> + /* If edge E is critical save predicate on it. */ >>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>> + set_edge_predicate (e, cond); >>>>> >>>>> how do we know the edge is critical by this simple check? Why not >>>>> simply always save edge predicates (well, you kind of do but omit >>>>> the case where e->src dominates e->dest). >>>>> >>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>> pass but need to clear it at the end (best use clear_aux_for_edges () >>>>> for that). So stuff like >>>>> >>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>> &false_edge); >>>>> + if (flag_force_vectorize) >>>>> + true_edge->aux = false_edge->aux = NULL; >>>>> >>>>> shouldn't be necessary. >>>>> >>>>> I think the edge predicate handling should also be unconditionally >>>>> and not depend on flag_force_vectorize. >>>>> >>>>> + /* The loop latch and loop exit block are always executed and >>>>> + have no extra conditions to be processed: skip them. */ >>>>> + if (bb == loop->latch >>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>> >>>>> I don't think the edge stuff is true - given you still only reset the >>>>> loop->latch bb predicate the change looks broken. >>>>> >>>>> + /* Fold_build2 can produce bool conversion which is not >>>>> + supported by vectorizer, so re-build it without folding. >>>>> + For example, such conversion is generated for sequence: >>>>> + _Bool _7, _8, _9; >>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>> + if (_9 != 0) --> (bool)_9. */ >>>>> + >>>>> + if (CONVERT_EXPR_P (c) >>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>> >>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>> operands from the GIMPLE condition unmodified after all. >>>>> >>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>> - unshare_expr (cond), c2); >>>>> + add_to_dst_predicate_list (loop, false_edge, unshare_expr >>>>> (cond), >>>>> + unshare_expr (c2)); >>>>> >>>>> why is it necessary to unshare c2? >>>>> >>>>> Please split out the PHI-with-multi-arg handling (I have not looked at >>>>> that in detail). >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>> >>>>>> Changelog. >>>>>> >>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. >>>>>> (flag_force_vectorize): New variable. >>>>>> (edge_predicate): New function. >>>>>> (set_edge_predicate): New function. >>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>> for join blocks if it exists. >>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>> destination block of edge is not always executed. Set-up predicate >>>>>> for critical edge. >>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>> (all_edges_are_critical): New function. >>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>> to reject block if-conversion with incoming critical edges only if >>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. >>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>> it returns NULL if given phi node must be handled by means of >>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>> algorithm is used. >>>>>> (get_predicate_for_edge): New function. >>>>>> (find_insertion_point): New function. >>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>> Invoke find_insertion_point to initialize gsi and >>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>> that extended predication must be applied). >>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>> predicates at the block begining for extended if-conversion. >>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>> for innermost loop marked with pragma omp simd and >>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges >>>>>> for blocks with two successors. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>> Richard, >>>>>>> >>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>> Let's me also answer on your comments. >>>>>>> >>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges. >>>>>>> My previous code was not correct and now it looks like: >>>>>>> >>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1) >>>>>>> /* Edge E is not critical, use predicate of edge source bb. */ >>>>>>> c = bb_predicate (b); >>>>>>> else >>>>>>> /* Edge E is critical and its aux field contains predicate. */ >>>>>>> c = edge_predicate (e); >>>>>>> >>>>>>> 2. I completely delete all code related to creation of conditional >>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>> vectorizer. But we need to delete all dead predicate computations >>>>>>> which are not used since they prevent vectorization. I will add this >>>>>>> local-dce function in next patch. >>>>>>> 3. I also did not include in this patch recognition of general >>>>>>> phi-nodes with two arguments only for which conversion of conditional >>>>>>> scalar reduction can be applied also. >>>>>>> Note that all these changes are applied for loop marked with pragma >>>>>>> omp simd only. >>>>>>> >>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. >>>>>>> (flag_force_vectorize): New variable. >>>>>>> (edge_predicate): New function. >>>>>>> (set_edge_predicate): New function. >>>>>>> (convert_name_to_cmp): New function. >>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>> for join blocks if it exists. >>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>> for critical edge. >>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>> (all_edges_are_critical): New function. >>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>> algorithm is used. >>>>>>> (get_predicate_for_edge): New function. >>>>>>> (find_insertion_point): New function. >>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>> that extended predication must be applied). >>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>> predicates at the block begining for extended if-conversion. >>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>> for innermost loop marked with pragma omp simd and >>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges >>>>>>> for blocks with two successors. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard! >>>>>>>>> Here is updated patch with the following changes: >>>>>>>>> >>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended >>>>>>>>> conversion. >>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, i.e. >>>>>>>>> negate_predicate was deleted. >>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can >>>>>>>>> be critical. >>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic >>>>>>>>> blocks to simplify it. >>>>>>>>> 5. I decided to not design pre-pass since it will lead generating >>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi >>>>>>>>> of kind >>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>> only one cond expression is required and this is considered as simple >>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>> if phi-function have only two different arguments and one of them has >>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2 >>>>>>>>> arguments. >>>>>>>>> For arbitrary phi function a chain of cond expressions is produced. >>>>>>>>> >>>>>>>>> Updated patch is attached. >>>>>>>>> >>>>>>>>> Any comments will be appreciated. >>>>>>>> >>>>>>>> The patch is still very big and does multiple things at once which >>>>>>>> makes >>>>>>>> it hard to review. >>>>>>>> >>>>>>>> In addition to that it changes function singatures without updating >>>>>>>> the function comments. For example what is the convert_bool >>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>> all this added logic. >>>>>>>> >>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>> >>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>> only two unequal operands is useful generally, so that's an obvious >>>>>>>> candidate for splitting out into a separate patch. >>>>>>>> >>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>> computations >>>>>>>> + which is not supported by vectorizer to int type through creating >>>>>>>> of >>>>>>>> + conditional expressions. */ >>>>>>>> >>>>>>>> Example? The vectorizer has patterns for bool predicate computations. >>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>> >>>>>>>> The way you get around the critical edge parts looks awkward to me. >>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>> split critical edges (of the respective loop body). >>>>>>>> >>>>>>>> I still think that an utility doing same PHI arg merging by introducing >>>>>>>> forwarder blocks would be nicer to have. >>>>>>>> >>>>>>>> I'd restructure the main tree_if_conversion function to apply these >>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>> for if conversion (eventually transitioning to always doing that). >>>>>>>> >>>>>>>> So - please split up the patch. It's way too big. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>> clone. >>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>> (edge_predicate): New function. >>>>>>>>> (set_edge_predicate): New function. >>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>> (get_type_for_cond): New function. >>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>> (predicate_disjunction): New function. >>>>>>>>> (predicate_conjunction): New function. >>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and >>>>>>>>> such bb exists; save it in static variable for further possible use. >>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>> (equal_phi_args): New function. >>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize. >>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical >>>>>>>>> to reject block if-conversion with imcoming critical edges only if >>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>> (walk_cond_tree): New function. >>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform >>>>>>>>> comparison expressions of boolean type into conditional expressions >>>>>>>>> with integral operands. If convert_bool argument was set-up and >>>>>>>>> vect bool pattern can be appied perform the following transformation: >>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool >>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument >>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>> add_to_predicate_list. >>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's. >>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>> algorithm is used. >>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that >>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args. >>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond >>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. >>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>> (find_insertion_point): New function. >>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>> predication to build mask. >>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>> >>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> We implemented additional support for pragma omp simd in part of >>>>>>>>>>> extended if-conversion loops with such pragma. These extensions >>>>>>>>>>> include: >>>>>>>>>>> >>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer >>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); For >>>>>>>>>>> ordinary >>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than >>>>>>>>>>> 2 >>>>>>>>>>> predecessors. >>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in >>>>>>>>>>> current design: >>>>>>>>>>> all phi nodes must be in non-predicated basic block to conform >>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>> >>>>>>>>>> How is that so? If the PHI is predicated then its result will be >>>>>>>>>> used >>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs. >>>>>>>>>> >>>>>>>>>> No? >>>>>>>>>> >>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 >>>>>>>>>>> arguments >>>>>>>>>>> with some limitations: >>>>>>>>>>> - for phi nodes which have more than 2 arguments, but only two >>>>>>>>>>> arguments are different and one of them has the only occurence, >>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>> predicates >>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of COND_EXPR >>>>>>>>>>> will be generated for it. In current design very simple check is >>>>>>>>>>> used: >>>>>>>>>>> check starting from end that two edges correspondent to neighbor >>>>>>>>>>> arguments have common predecessor which is used for further check >>>>>>>>>>> with next edge. >>>>>>>>>>> These guarantee that phi predication will produce the correct >>>>>>>>>>> result. >>>>>>>>>> >>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by >>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>> >>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>> >>>>>>>>>> becomes >>>>>>>>>> >>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>> >>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>> >>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>> >>>>>>>>>> becomes >>>>>>>>>> >>>>>>>>>> bb 5: >>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>> >>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>> >>>>>>>>>> which means that 3) has to work. Note that we want this kind of >>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of >>>>>>>>>> copies we need to insert on edges. >>>>>>>>>> >>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass >>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform. >>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>> >>>>>>>>>> It looks like you introduce a "negate predicate" to work around the >>>>>>>>>> critical edge limitation? Please instead change if-conversion to >>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>> -march=core-avx2): >>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>> { >>>>>>>>>>> float t = a[i]; >>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>> res += 1; >>>>>>>>>>> } >>>>>>>>>>> <bb 4>: >>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>> goto <bb 4>; >>>>>>>>>>> >>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>> >>>>>>>>>>> gcc/ChageLog >>>>>>>>>>> >>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are >>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical >>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if >>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform >>>>>>>>>>> comparison expressions of boolean type into conditional expressions >>>>>>>>>>> with integral operands. If bool_conv argument is false or both >>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>> assignments >>>>>>>>>>> is used, otherwise the following code was added: check on applicable >>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>> compute predicates for both outgoing edges one of which is critical >>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates >>>>>>>>>>> using >>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in >>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and >>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge, >>>>>>>>>>> but generated gimplified statements are stored in their destination >>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to >>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional >>>>>>>>>>> argument >>>>>>>>>>> equal to false. >>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>> phi-block >>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>> algorithm is used. >>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals >>>>>>>>>>> that >>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if >>>>>>>>>>> cond >>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>>> predication to build mask. >>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for >>>>>>>>>>> innermost loop marked with pragma omp simd.