Richard, Sorry that I forgot to delete debug dump from my fix. I have few questions about your comments.
1. You wrote : > You also still have two functions for PHI predication. And the > new extended variant doesn't commonize the 2-args and general > path Did you mean that I must combine predicate_scalar_phi and predicate_extended scalar phi to one function? Please note that if additional flag was not set up (i.e. aggressive_if_conv is false) extended predication is required more compile time since it builds hash_map. 2. About critical edge splitting. Did you mean that we should perform it (1) under aggressive_if_conv option only; (2) should we split all critical edges. Note that this leads to recomputing of topological order. It is worth noting that in current implementation bb's with 2 predecessors and both are on critical edges are accepted without additional option. Thanks ahead. Yuri. 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> Here is updated patch2 with the following changes: >> 1. Delete functions phi_has_two_different_args and find_insertion_point. >> 2. Use only one function for extended predication - >> predicate_extended_scalar_phi. >> 3. Save gsi before insertion of predicate computations for basic >> blocks if it has 2 predecessors and >> both incoming edges are critical or it gas more than 2 predecessors >> and at least one incoming edge >> is critical. This saved iterator can be used by extended phi predication. >> >> Here is motivated test-case which explains this point. >> Test-case is attached (t5.c) and it must be compiled with -O2 >> -ftree-loop-vectorize -fopenmp options. >> The problem phi is in bb-7: >> >> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >> { >> <bb 5>: >> xmax_edge_18 = xmax_edge_36 + 1; >> if (xmax_17 == xmax_27) >> goto <bb 7>; >> else >> goto <bb 9>; >> >> } >> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >> { >> <bb 6>: >> if (xmax_17 == xmax_27) >> goto <bb 7>; >> else >> goto <bb 8>; >> >> } >> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >> { >> <bb 7>: >> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >> xmax_edge_19 = xmax_edge_39 + 1; >> goto <bb 11>; >> >> } >> >> Note that both incoming edges to bb_7 are critical. If we comment out >> restoring gsi in predicate_all_scalar_phi: >> #if 0 >> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >> gsi = bb_insert_point (bb); >> else >> #endif >> gsi = gsi_after_labels (bb); >> >> we will get ICE: >> t5.c: In function 'foo': >> t5.c:9:6: error: definition in block 4 follows the use >> void foo (int n) >> ^ >> for SSA_NAME: _1 in statement: >> _52 = _1 & _3; >> t5.c:9:6: internal compiler error: verify_ssa failed >> >> smce predicate computations were inserted in bb_7. > > The issue is obviously that the predicates have already been emitted > in the target BB - that's of course the wrong place. This is done > by insert_gimplified_predicates. > > This just shows how edge predicate handling is broken - we don't > seem to have a sequence of gimplified stmts for edge predicates > but push those to e->dest which makes this really messy. > > Rather than having a separate phase where we insert all > gimplified bb predicates we should do that on-demand when > predicating a PHI. > > Your patch writes to stderr - that's bad - use dump_file and guard > the printfs properly. > > You also still have two functions for PHI predication. And the > new extended variant doesn't commonize the 2-args and general > paths. > > I'm not at all happy with this code. It may be existing if-conv codes > fault but making it even worse is not an option. > > Again - what's wrong with simply splitting critical edges if > aggressive_if_conv? I think that would very much simplify > things here. Or alternatively use gsi_insert_on_edge and > commit edge insertions before merging the blocks. > > Thanks, > Richard. > >> ChangeLog is >> >> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >> >> * tree-if-conv.c : Include hash-map.h. >> (struct bb_predicate_s): Add new field to save copy of gimple >> statement iterator. >> (bb_insert_point): New function. >> (set_bb_insert_point): New function. >> (has_pred_critical_p): New function. >> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >> AGGRESSIVE_IF_CONV is true. >> (if_convertible_bb_p): Delete check that bb has at least one >> non-critical incoming edge. >> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >> Allow interchange PHI arguments if EXTENDED is false. >> Change check that block containing reduction statement candidate >> is predecessor of phi-block since phi may have more than two arguments. >> (predicate_scalar_phi): Add new arguments for call of >> is_cond_scalar_reduction. >> (get_predicate_for_edge): New function. >> (struct phi_args_hash_traits): New type. >> (phi_args_hash_traits::hash): New function. >> (phi_args_hash_traits::equal_keys): New function. >> (gen_phi_arg_condition): New function. >> (predicate_extended_scalar_phi): New function. >> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >> to true if BB containing phi has more than 2 predecessors or both >> incoming edges are critical. Invoke find_phi_replacement_condition and >> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >> has 2 predecessors and both incoming edges are critical or it has more >> than 2 predecessors and atleast one incoming edge is critical. >> Use standard gsi_after_labels otherwise. >> Invoke predicate_extended_scalar_phi if EXTENDED is true. >> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >> to save gsi before insertion of predicate computations. SEt-up it to >> true for BB with 2 predecessors and critical incoming edges either >> number of predecessors is geater 2 and at least one incoming edge is >> critical. >> Add check that non-predicated block may have statements to insert. >> Insert predicate computation of BB just after label if >> EXTENDED_PREDICATION is true. >> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >> is copy of inner or outer loop force_vectorize field. >> >> >> >> >> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> I did simple change by saving gsi iterator for each bb that has >>>> critical edges by adding additional field to bb_predicate_s: >>>> >>>> typedef struct bb_predicate_s { >>>> >>>> /* The condition under which this basic block is executed. */ >>>> tree predicate; >>>> >>>> /* PREDICATE is gimplified, and the sequence of statements is >>>> recorded here, in order to avoid the duplication of computations >>>> that occur in previous conditions. See PR44483. */ >>>> gimple_seq predicate_gimplified_stmts; >>>> >>>> /* Insertion point for blocks having incoming critical edges. */ >>>> gimple_stmt_iterator gsi; >>>> } *bb_predicate_p; >>>> >>>> and this iterator is saved in insert_gimplified_predicates before >>>> insertion code for predicate computation. I checked that this fix >>>> works. >>> >>> Huh? I still wonder what the issue is with inserting everything >>> after the PHI we predicate. >>> >>> Well, your updated patch will come with testcases for the testsuite >>> that will hopefully fail if doing that. >>> >>> Richard. >>> >>>> >>>> Now I am implementing merging of predicate_extended.. and >>>> predicate_arbitrary.. functions as you proposed. >>>> >>>> Best regards. >>>> Yuri. >>>> >>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Thanks Richard for your quick reply! >>>>>> >>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>> 2. What is your opinion about using more simple decision about >>>>>> insertion point - if bb has use of phi result insert phi predication >>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>> splitting is not a good decision. >>>>> >>>>> Why not always insert before the use? Which would be after labels, >>>>> what we do for two-arg PHIs. That is, how can it be that you predicate >>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>> can only happen for backedges but those you can't remove in any case. >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> Best regards. >>>>>> Yuri. >>>>>> >>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi Richard, >>>>>>>> >>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>> >>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>> >>>>>>>> Let's consider the following simple test-case: >>>>>>>> >>>>>>>> #pragma omp simd safelen(8) >>>>>>>> for (i=0; i<512; i++) >>>>>>>> { >>>>>>>> float t = a[i]; >>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>> res += 1; >>>>>>>> } >>>>>>>> >>>>>>>> we can see the following phi node correspondent to res: >>>>>>>> >>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>> >>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>>>> and only one check can be used for phi predication (for reduction in >>>>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>>>> even if we sort all phi argument values since we still have to produce >>>>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>> >>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>> treat it as an 'else' case. That even works for >>>>>>> >>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>> >>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>> ! (condition for 3 || condition for 4). >>>>>>> >>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>> used for edges 3 and 4 combined. >>>>>>> >>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>> predicates, e.g. >>>>>>>> >>>>>>>> <bb 7>: >>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>> _48 = _46 & _47; >>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>> _54 = ~_53; >>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>> _56 = _54 & _55; >>>>>>>> _57 = _48 | _56; >>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>> goto <bb 11>; >>>>>>>> >>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>> beginning but need to put it after predicate computations. >>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>> insertion point will be "after labels" Note also that phi result can >>>>>>>> have use in this block too, so we can't put predication code to the >>>>>>>> block end. >>>>>>> >>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>> for critical edges unless you split them). >>>>>>> >>>>>>> I think I've told you before that I prefer simple solutions to such >>>>>>> issues, >>>>>>> like splitting the edge! Certainly not involving a function walking >>>>>>> GENERIC expressions. >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> Let me know if you still have any questions. >>>>>>>> >>>>>>>> Best regards. >>>>>>>> Yuri. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>> >>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>>>> lead to less efficient binaries. >>>>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>>>> scalar reduction is applied. >>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>> New function phi_has_two_different_args was introduced to detect >>>>>>>>>> such phi. >>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - >>>>>>>>>> it >>>>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>>>> are already inserted above this block and >>>>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>>>> computations are in the block where code for phi predication must be >>>>>>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>>>> after it (with some minor exceptions). >>>>>>>>> >>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>> >>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>> >>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>> >>>>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>>>> inserted code if you insert after labels just like for the other case. >>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course >>>>>>>>> needs >>>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>>> >>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>> >>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>> >>>>>>>>>> ChangeLog: >>>>>>>>>> >>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>> non-critical incoming edge. >>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>> statement before/after gsi point. >>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>>>> EXTENDED value. >>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>>>> may have statements to insert. Insert predicate of BB just after >>>>>>>>>> label >>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE >>>>>>>>>> which >>>>>>>>>> is copy of inner or outer loop field force_vectorize.