Thanks Richard for your quick reply! 1. I agree that we can combine predicate_extended_ and predicate_arbitrary_ to one function as you proposed. 2. What is your opinion about using more simple decision about insertion point - if bb has use of phi result insert phi predication before it and at the bb end otherwise. I assume that critical edge splitting is not a good decision.
Best regards. Yuri. 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi Richard, >> >> I resend you patch1 and patch2 with minor changes: >> 1. I renamed flag_force_vectorize to aggressive_if_conv. >> 2. Use static cast for the first argument of gimple_phi_arg_edge. >> I also very sorry that I sent you bad patch. >> >> Now let me answer on your questions related to second patch. >> 1. Why we need both predicate_extended_scalar_phi and >> predicate_arbitrary_scalar_phi? >> >> Let's consider the following simple test-case: >> >> #pragma omp simd safelen(8) >> for (i=0; i<512; i++) >> { >> float t = a[i]; >> if (t > 0.0f & t < 1.0e+17f) >> if (c[i] != 0) /* c is integer array. */ >> res += 1; >> } >> >> we can see the following phi node correspondent to res: >> >> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >> >> It is clear that we can optimize it to phi node with 2 arguments only >> and only one check can be used for phi predication (for reduction in >> our case), namely predicate of bb_5. In general case we can't do it >> even if we sort all phi argument values since we still have to produce >> a chain of cond expressions to perform phi predication (see comments >> for predicate_arbitrary_scalar_phi). > > How so? We can always use !(condition) for the "last" value, thus > treat it as an 'else' case. That even works for > > # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> > > where the condition for edges 5 and 7 can be computed as > ! (condition for 3 || condition for 4). > > Of course it is worthwhile to also sort single-occurances first > so your case gets just the condiiton for edge 5 and its inversion > used for edges 3 and 4 combined. > >> 2. Why we need to introduce find_insertion_point? >> Let's consider another test-case extracted from 175.vpr ( t5.c is >> attached) and we can see that bb_7 and bb_9 containig phi nodes has >> only critical incoming edges and both contain code computing edge >> predicates, e.g. >> >> <bb 7>: >> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >> _46 = xmax_17 == xmax_37; >> _47 = xmax_17 == xmax_27; >> _48 = _46 & _47; >> _53 = xmax_17 == xmax_37; >> _54 = ~_53; >> _55 = xmax_17 == xmax_27; >> _56 = _54 & _55; >> _57 = _48 | _56; >> xmax_edge_19 = xmax_edge_39 + 1; >> goto <bb 11>; >> >> It is evident that we can not put phi predication at the block >> beginning but need to put it after predicate computations. >> Note also that if there are no critical edges for phi arguments >> insertion point will be "after labels" Note also that phi result can >> have use in this block too, so we can't put predication code to the >> block end. > > So the issue is that predicate insertion for edge predicates does > not happen on the edge but somewhere else (generally impossible > for critical edges unless you split them). > > I think I've told you before that I prefer simple solutions to such issues, > like splitting the edge! Certainly not involving a function walking > GENERIC expressions. > > Thanks, > Richard. > >> Let me know if you still have any questions. >> >> Best regards. >> Yuri. >> >> >> >> >> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Hi All, >>>> >>>> Here is the second patch related to extended predication. >>>> Few comments which explain a main goal of design. >>>> >>>> 1. I don't want to insert any critical edge splitting since it may >>>> lead to less efficient binaries. >>>> 2. One special case of extended PHI node predication was introduced >>>> when #arguments is more than 2 but only two arguments are different >>>> and one argument has the only occurrence. For such PHI conditional >>>> scalar reduction is applied. >>>> This is correspondent to the following statement: >>>> if (q1 && q2 && q3) var++ >>>> New function phi_has_two_different_args was introduced to detect such phi. >>>> 3. Original algorithm for PHI predication used assumption that at >>>> least one incoming edge for blocks containing PHI is not critical - it >>>> guarantees that all computations related to predicate of normal edge >>>> are already inserted above this block and >>>> code related to PHI predication can be inserted at the beginning of >>>> block. But this is not true for critical edges for which predicate >>>> computations are in the block where code for phi predication must be >>>> inserted. So new function find_insertion_point is introduced which is >>>> simply found out the last statement in block defining predicates >>>> correspondent to all incoming edges and insert phi predication code >>>> after it (with some minor exceptions). >>> >>> Unfortunately the patch doesn't apply for me - I get >>> >>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>> predicate_all_scalar_phis (struct loop *loop) >>> >>> a few remarks nevertheless. I don't see how we need both >>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>> Couldn't we simply sort an array of (edge, value) pairs after value >>> and handle equal values specially in predicate_extended_scalar_phi? >>> That would even make PHI <a, a, b, c, c> more optimal. >>> >>> I don't understand the need for find_insertion_point. All SSA names >>> required for the predicates are defined upward - and the complex CFG >>> is squashed to a single basic-block, thus the defs will dominate the >>> inserted code if you insert after labels just like for the other case. >>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>> to happen in dominator order - but I guess that happens already?) >>> >>> I'd like the extended PHI handling to be enablable by a flag even >>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>> times that would have been nice to vectorize. I suggest to >>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>> followup, but please rename the local flag_force_vectorize flag >>> to something less looking like a flag, like simply 'aggressive'. >>> >>> Otherwise patch 2 looks ok to me. >>> >>> Richard. >>> >>> >>>> ChangeLog: >>>> >>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>> >>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>> FLAG_FORCE_VECTORIZE is true. >>>> (if_convertible_bb_p): Delete check that bb has at least one >>>> non-critical incoming edge. >>>> (phi_has_two_different_args): New function. >>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>> arguments if EXTENDED is true. Change check that block >>>> containing reduction statement candidate is predecessor >>>> of phi-block since phi may have more than two arguments. >>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>> statement before/after gsi point. >>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>> predication) to call of is_cond_scalar_reduction. Add argument >>>> true (which correspondent to argument BEFORE) to call of >>>> convert_scalar_cond_reduction. >>>> (get_predicate_for_edge): New function. >>>> (predicate_arbitrary_scalar_phi): New function. >>>> (predicate_extended_scalar_phi): New function. >>>> (find_insertion_point): New function. >>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>> than 2 predecessors or both incoming edges are critical. Invoke >>>> find_phi_replacement_condition and predicate_scalar_phi or >>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>> EXTENDED value. >>>> (insert_gimplified_predicates): Add check that non-predicated block >>>> may have statements to insert. Insert predicate of BB just after label >>>> if FLAG_FORCE_VECTORIZE is true. >>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>> is copy of inner or outer loop field force_vectorize.