On Mon, 18 Nov 2019, Tamar Christina wrote: > Hi Richi, > > Thanks for the feedback, if you wouldn't mind indulging me quickly (for the > version for next stage-1) > > The original approach we had was indeed using a vec-pattern which turned > > best_i_11 = _4 > best_26 ? i_27 : best_i_25; > best_13 = MAX_EXPR <_4, best_26>; > > into > > best_13 = MAX_EXPR <_4, best_26>; > best_i_11 = _4 == best_13 ? best_i_25 : i_27;
Ah, interesting way to rewrite this indeed (didn't think of it). I think for FP compares this might be a bit iffy though. > which was: > > static gimple * > vect_recog_minmax_index_pattern (stmt_vec_info stmt_vinfo, tree *type_out) > { > tree oprnd0, oprnd1; > gimple *last_stmt = stmt_vinfo->stmt; > vec_info *vinfo = stmt_vinfo->vinfo; > tree type; > gimple *use_stmt; > imm_use_iterator iter; > gimple_stmt_iterator gsi; > > /* Starting from LAST_STMT, follow the defs of its uses in search > of the above pattern. */ > > if (!vect_reassociating_reduction_simple_p (stmt_vinfo, MAX_EXPR, > &oprnd0, &oprnd1)) > return NULL; > > type = gimple_expr_type (last_stmt); > > if (!is_a <gphi *> (SSA_NAME_DEF_STMT (oprnd1))) > return NULL; > > stmt_vec_info phy_vinfo = vinfo->lookup_def (oprnd1); > if (!phy_vinfo) > return NULL; > > basic_block top_bb = gimple_bb (last_stmt); > bool found_p = false; > > FOR_EACH_IMM_USE_STMT (use_stmt, iter, oprnd1) > { > if (is_gimple_assign (use_stmt) > && gimple_assign_rhs_code (use_stmt) == COND_EXPR) > { > basic_block bb = gimple_bb (use_stmt); > > if (bb == top_bb > && gimple_uid (use_stmt) < gimple_uid (last_stmt)) > { > tree cond = gimple_assign_rhs1 (use_stmt); > if (TREE_CODE (cond) != GT_EXPR) > continue; > > tree true_b = gimple_assign_rhs2 (use_stmt); > tree false_b = gimple_assign_rhs3 (use_stmt); > TREE_SET_CODE (cond, NE_EXPR); > TREE_OPERAND (cond, 1) = gimple_assign_lhs (last_stmt); > gimple_set_op (use_stmt, 3, false_b); > gimple_set_op (use_stmt, 2, true_b); > > gimple_seq *def_seq = &STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo); > gsi = gsi_for_stmt (use_stmt, def_seq); > gsi_remove (&gsi, false); > gsi = gsi_for_stmt (last_stmt, def_seq); > gimple_set_bb (use_stmt, bb); > gsi_insert_after (&gsi, use_stmt, GSI_SAME_STMT); > > vect_pattern_detected ("vect_recog_minmax_index_pattern", > last_stmt); > found_p = true; > } > } > } > > if (found_p) > { > STMT_VINFO_DEF_TYPE (phy_vinfo) = vect_min_max_reduction_def; > } > > return NULL; > } > > So it's rewriting the statement into one where the recursive phi node is > only used once And marking the reduction as a vect_min_max_reduction_def > to handle it correctly later on. The main issue you likely run into is that cycle detection runs before pattern detection and cycle detection already fails. > There was two things that bugged be about it though. Returning NULL from the > vect-pattern felt odd. > We could return the last statement here as a the pattern, but since we also > have to re-order the statements in the BB it seemed better not to treat it as > a simple pattern. > This made me think it really didn't belong here and that it's not really a > vectorizer pattern, but rather a work around. > > Maybe this should be done in ifconv? Where the conversions are created > in the first place? That's also a possibility though the IL is valid also before if-conversion so if-conversion might end up doing nothing. > Also doing this seemed to require a lot of generic boiler plate changes > to support vect_min_max_reduction_def, which also seemed a bit messy. Well, that I'd simply defer to vectorizable_reduction which walks the discovered cycle and computes STMT_VINFO_REDUC_TYPE (COND_REDUCTION, etc.). > Any thoughts here? > > On a side note, this code in vect_create_epilog_for_reduction > > for (elt_offset = nelements / 2; > elt_offset >= 1; > elt_offset /= 2) > { > calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); > indices.new_vector (sel, 2, nelements); > tree mask = vect_gen_perm_mask_any (vectype1, indices); > new_name = gimple_build (&stmts, VEC_PERM_EXPR, vectype1, > new_temp, zero_vec, mask); > new_temp = gimple_build (&stmts, code, > vectype1, new_name, new_temp); > } > gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT); > > ..... > > rhs = build3 (BIT_FIELD_REF, scalar_type, new_temp, > bitsize, bitsize_zero_node); > epilog_stmt = gimple_build_assign (new_scalar_dest, rhs); > > Seems like a very roundabout way of doing a reduction.. unless I have > misunderstood something? > If "code" is PLUS, MIN, MAX etc wouldn't a reduction call here be better if > the target supports it? This is quite hard to patch up later in combine > particularly for byte of hf modes where the number of statements you'd need > to match exceed combine's maximum. Yes, I think that's already done - vectorizable_reduction checks for such supported epilogue operation. So I think the main issue right now is that vect_is_simple_reduction needs to identify the beast as a single reduction. I think it's kind-of a COND_REDUCTION where the induction PHI is already in the original source. Richard. > Thanks, > Tamar > > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Monday, November 18, 2019 11:24 > > To: Joel Hutton <joel.hut...@arm.com> > > Cc: GCC Patches <gcc-patches@gcc.gnu.org>; Tamar Christina > > <tamar.christ...@arm.com>; nd <n...@arm.com> > > Subject: Re: [RFC][GCC][AArch64] Add minmax phi-reduction pattern > > > > On Fri, 15 Nov 2019, Joel Hutton wrote: > > > > > Forgot to CC maintainer. > > > On 15/11/2019 18:03, Joel wrote: > > > > Hi all, > > > > > > > > Just looking for some feedback on the approach. > > > > > > > > Currently the loop vectorizer can't vectorize the following typical > > > > loop for getting max value and index from an array: > > > > > > > > void test_vec(int *data, int n) { > > > > int best_i, best = 0; > > > > > > > > for (int i = 0; i < n; i++) { > > > > if (data[i] > best) { > > > > best = data[i]; > > > > best_i = i; > > > > } > > > > } > > > > > > > > data[best_i] = data[0]; > > > > data[0] = best; > > > > } > > > > > > > > This patch adds some support for this pattern. > > > > > > > > This patch addresses Bug 88259. > > > > > > > > Regression testing is still in progress, This patch does not work > > > > correctly with vect-epilogues-nomask, and the reason for that is > > > > still being investigated. > > > > Quick posting before stage1 ends, heh? > > > > New functions lack comments, new fields in stmt_info as well, accesses > > should go through (missing) STMT_VINFO_* macros. > > > > You are removing a safety check without replacement elsewhere: > > > > - /* Check there's only a single stmt the op is used on inside > > - of the loop. */ > > - imm_use_iterator imm_iter; > > - gimple *op_use_stmt; > > - unsigned cnt = 0; > > - FOR_EACH_IMM_USE_STMT (op_use_stmt, imm_iter, op) > > - if (!is_gimple_debug (op_use_stmt) > > - && flow_bb_inside_loop_p (loop, gimple_bb (op_use_stmt))) > > - cnt++; > > - if (cnt != 1) > > - { > > - fail = true; > > - break; > > - } > > > > AFAICS you still analyze both PHIs but the correct thing to do here is to > > view > > both SSA cycles as a single one - when analyzing > > > > # best_i_25 = PHI <best_i_11(8), best_i_16(D)(18)> > > # best_26 = PHI <best_13(8), 0(18)> > > ... > > best_i_11 = _4 <= best_26 ? best_i_25 : i_27; > > best_13 = MAX_EXPR <_4, best_26>; > > > > and starting from the best_i_25 PHI nothing prevents check_reduction_path > > SCC finding to walk to the best_26 cycle but luck(?). > > > > And the sanity check should be that all uses are within the detected cycle > > (which then would be the case). > > > > So your handling looks like an afterthought - exactly going backwards of my > > attempts to make the code better structured :/ > > > > The code-gen you do in vect_create_epilog_for_reduction needs a comment > > - vect_create_epilog_for_reduction will be called twice in unspecified > > order, > > so clearly unconditionally accessing stmt_info->reduc_related_stmt- > > >reduc_minmax_epilog_stmt > > (if it is what it looks like...) is not going to work. You are also using > > IFN_REDUC_MAX which possibly isn't available. > > > > So I think what you want to do is "detect" the SCC with two PHIs, then - > > maybe in tree-vect-patterns replace it with a single PHI and a single > > operation, or somehow mangle it into a SLP tree to make it a single > > reduction. > > > > So please see to all this for next stage1. > > > > Thanks, > > Richard. > > > > > > gcc/ChangeLog: > > > > > > > > > > > > 2019-11-15 Joel Hutton <joel.hut...@arm.com> > > > > Tamar Christina <tamar.christ...@arm.com> > > > > > > > > PR tree-optimization/88259 > > > > * tree-vect-loop.c (vect_reassociating_reduction_simple_p): New > > > > function. > > > > (vect_recog_minmax_index_pattern): New function. > > > > (vect_is_simple_reduction): Add check for minmax pattern. > > > > (vect_model_reduction_cost): Add case for minmax pattern. > > > > (vect_create_epilog_for_reduction): Add fixup for minmax epilog. > > > > * tree-vectorizer.h (enum vect_reduction_type): Add > > > > INDEX_MINMAX_REDUCTION reduction type. > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)