On Mon, Oct 19, 2015 at 10:17 AM, Alan Hayward <alan.hayw...@arm.com> wrote: > > >>On 30/09/2015 13:45, "Richard Biener" <richard.guent...@gmail.com> wrote: >> >>>On Wed, Sep 23, 2015 at 5:51 PM, Alan Hayward <alan.hayw...@arm.com> >>>wrote: >>>> >>>> >>>> On 18/09/2015 14:53, "Alan Hayward" <alan.hayw...@arm.com> wrote: >>>> >>>>> >>>>> >>>>>On 18/09/2015 14:26, "Alan Lawrence" <alan.lawre...@arm.com> wrote: >>>>> >>>>>>On 18/09/15 13:17, Richard Biener wrote: >>>>>>> >>>>>>> Ok, I see. >>>>>>> >>>>>>> That this case is already vectorized is because it implements >>>>>>>MAX_EXPR, >>>>>>> modifying it slightly to >>>>>>> >>>>>>> int foo (int *a) >>>>>>> { >>>>>>> int val = 0; >>>>>>> for (int i = 0; i < 1024; ++i) >>>>>>> if (a[i] > val) >>>>>>> val = a[i] + 1; >>>>>>> return val; >>>>>>> } >>>>>>> >>>>>>> makes it no longer handled by current code. >>>>>>> >>>>>> >>>>>>Yes. I believe the idea for the patch is to handle arbitrary >>>>>>expressions >>>>>>like >>>>>> >>>>>>int foo (int *a) >>>>>>{ >>>>>> int val = 0; >>>>>> for (int i = 0; i < 1024; ++i) >>>>>> if (some_expression (i)) >>>>>> val = another_expression (i); >>>>>> return val; >>>>>>} >>>>> >>>>>Yes, thatâs correct. Hopefully my new test cases should cover >>>>>everything. >>>>> >>>> >>>> Attached is a new version of the patch containing all the changes >>>> requested by Richard. >>> >>>+ /* Compare the max index vector to the vector of found indexes to >>>find >>>+ the postion of the max value. This will result in either a >>>single >>>+ match or all of the values. */ >>>+ tree vec_compare = make_ssa_name (index_vec_type_signed); >>>+ gimple vec_compare_stmt = gimple_build_assign (vec_compare, >>>EQ_EXPR, >>>+ induction_index, >>>+ max_index_vec); >>> >>>I'm not sure all targets can handle this. If I deciper the code >>>correctly then we do >>> >>> mask = induction_index == max_index_vec; >>> vec_and = mask & vec_data; >>> >>>plus some casts. So this is basically >>> >>> vec_and = induction_index == max_index_vec ? vec_data : {0, 0, ... }; >>> >>>without the need to relate the induction index vector type to the data >>>vector type. >>>I believe this is also the form all targets support. >> >> >>Ok, Iâll replace this. >> >>> >>>I am missing a comment before all this code-generation that shows the >>>transform >>>result with the variable names used in the code-gen. I have a hard >>>time connecting >>>things here. >> >>Ok, Iâll add some comments. >> >>> >>>+ tree matched_data_reduc_cast = build1 (VIEW_CONVERT_EXPR, >>>scalar_type, >>>+ matched_data_reduc); >>>+ epilog_stmt = gimple_build_assign (new_scalar_dest, >>>+ matched_data_reduc_cast); >>>+ new_temp = make_ssa_name (new_scalar_dest, epilog_stmt); >>>+ gimple_assign_set_lhs (epilog_stmt, new_temp); >>> >>>this will leave the stmt unsimplified. scalar sign-changes should use >>>NOP_EXPR, >>>not VIEW_CONVERT_EXPR. The easiest fix is to use fold_convert instead. >>>Also just do like before - first make_ssa_name and then directly use it >>>in the >>>gimple_build_assign. >> >>We need the VIEW_CONVERT_EXPR for the cases where we have float data >>values. The index is always integer. >> >> >>> >>>The patch is somewhat hard to parse with all the indentation changes. A >>>context >>>diff would be much easier to read in those contexts. >> >>Ok, Iâll make the next patch like that >> >>> >>>+ if (v_reduc_type == COND_REDUCTION) >>>+ { >>>+ widest_int ni; >>>+ >>>+ if (! max_loop_iterations (loop, &ni)) >>>+ { >>>+ if (dump_enabled_p ()) >>>+ dump_printf_loc (MSG_NOTE, vect_location, >>>+ "loop count not known, cannot create cond " >>>+ "reduction.\n"); >>> >>>ugh. That's bad. >>> >>>+ /* The additional index will be the same type as the condition. >>>Check >>>+ that the loop can fit into this less one (because we'll use up >>>the >>>+ zero slot for when there are no matches). */ >>>+ tree max_index = TYPE_MAX_VALUE (cr_index_scalar_type); >>>+ if (wi::geu_p (ni, wi::to_widest (max_index))) >>>+ { >>>+ if (dump_enabled_p ()) >>>+ dump_printf_loc (MSG_NOTE, vect_location, >>>+ "loop size is greater than data size.\n"); >>>+ return false; >>> >>>Likewise. >> >>We could do better if we made the index type larger. >>But as a first implementation of this optimisation, I didnât want to >>overcomplicate things more. >> >>> >>>@@ -5327,6 +5540,8 @@ vectorizable_reduction (gimple stmt, >>>gimple_stmt_iterator *gsi, >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n"); >>> >>>+ STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type; >>>+ >>> /* FORNOW: Multiple types are not supported for condition. */ >>> if (code == COND_EXPR) >>> >>>this change looks odd (or wrong). The type should be _only_ set/changed >>>during >>>analysis. >> >> >>The problem is, for COND_EXPRs, this function calls >>vectorizable_condition(), which sets STMT_VINFO_TYPE to >>condition_vec_info_type.
Ah, the pre-existing issue of the transform phase re-doing the analysis... a fix would be to condition that call on vec_stmt == NULL, thus analysis phase. >>Therefore we need something to restore it back to reduc_vec_info_type on >>the non-analysis call. >> >>I considered setting STMT_VINFO_TYPE to reduc_vec_info_type directly after >>the call to vectorizable_condition(), but that looked worse. >>I could back up the value of STMT_VINFO_TYPE before calling >>vectorizable_condition() and then restore it after? I think thatâll look a >>lot better. >> >> >>> >>>+ >>>+ /* For cond reductions we need to add an additional conditional >>>based on >>>+ the loop index. */ >>>+ if (v_reduc_type == COND_REDUCTION) >>>+ { >>>+ int nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out); >>>+ int k; >>>... >>>+ STMT_VINFO_VECTYPE (index_vec_info) = cr_index_vector_type; >>>+ set_vinfo_for_stmt (index_condition, index_vec_info); >>>+ >>>+ /* Update the phi with the vec cond. */ >>>+ add_phi_arg (new_phi, cond_name, loop_latch_edge (loop), >>>+ UNKNOWN_LOCATION); >>> >>>same as before - I am missing a comment that shows the generated code >>>and connects >>>the local vars used. >> >>Ok, Iâll add something >> >>> >>> >>>+ tree ccompare_name = make_ssa_name (TREE_TYPE (ccompare)); >>>+ gimple ccompare_stmt = gimple_build_assign (ccompare_name, >>>ccompare); >>>+ gsi_insert_before (&vec_stmt_gsi, ccompare_stmt, >>>GSI_SAME_STMT); >>>+ gimple_assign_set_rhs1 (*vec_stmt, ccompare_name); >>> >>>hum - are you sure this works with ncopies > 1? Will it use the >>>correct vec_stmt? >> >>We donât support this when ncopies >1. >> >>In vectorizable_reduction(): >> >>if ((double_reduc || v_reduc_type == COND_REDUCTION) && ncopies > 1) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> "multiple types in double reduction or condition " >> "reduction.\n"); >> return false; >> } >> >> >> >>> >>>I still dislike the v_reduc_type plastered and passed everywhere. Can >>>you explore >>>adding the reduction kind to stmt_info? >> >>Ok, I can do that. >> >> >>Thanks for the comments. >>Iâll put together a patch with the above changes. >> >>Thanks, >>Alan. >> >> > > Richard, as requested I've updated with the follow changes: > > * AND and EQ replaced with a COND_EXPR > * Better comments for the code gen, included references to variable names > * Kept the VIEW_CONVERT_EXPR - we need this for when the data is float type > * Backed up STMT_VINFO_TYPE before the call to vectorizable_condition() > and restored after the call. I considered extracting the relvant parts of > vectorizable_condition() into a sub function, but it had too many > dependencies with the rest of vectorizable_condition(). > * v_reduc_type is now part of stmt_info > * Created a diff using 50 lines of context Ok with the vectorizable_condition call guarded as suggested instead. Thanks, Richard. > > Thanks, > Alan. > >