Martin Liška <mli...@suse.cz> writes: > Hello. > > This is second attempt to get rid of tcc_comparison GENERIC trees > to be used as the first argument of VEC_COND_EXPR. > > The patch attempts achieves that in the following steps: > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME > 2) since that tcc_comparsion can't be used as the first argument of > VEC_COND_EXPR > (done in GIMPLE verifier) > 3) I exposed new internal functions with: > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond) > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu) > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq) > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask) > > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass > 5) the pass expands VEC_COND_EXPR into one of the internal functions defined > in 3) > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when > a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake > target and I don't see any reasonable change. > > Achieved benefits of the patch: > - removal of a GENERIC expression being used in GIMPLE statements > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE) > - possibility to expand smarter based on number of uses > (expand_vec_cmp_expr_p) > > Future plans: > - tcc_comparison removal just during gimplification > - removal of a code where these expressions are handled for VEC_COND_EXPR > - do the similar thing for COND_EXPR? > > The task was guided by Richi (Biener) and I bet he can help with both further > questions > and reasoning.
Thanks for doing this. It definitely seems more friendly than the four-operand version to targets where separate comparisons are the norm. Just a couple of comments about the implementation: > diff --git a/gcc/passes.def b/gcc/passes.def > index 2bf2cb78fc5..d654e5ee9fe 100644 > --- a/gcc/passes.def > +++ b/gcc/passes.def > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3. If not see > NEXT_PASS (pass_cleanup_eh); > NEXT_PASS (pass_lower_resx); > NEXT_PASS (pass_nrv); > + NEXT_PASS (pass_gimple_isel); > NEXT_PASS (pass_cleanup_cfg_post_optimizing); > NEXT_PASS (pass_warn_function_noreturn); > NEXT_PASS (pass_gen_hsail); What was the reason for making this a separate pass, rather than doing it as part of veclower? If we do them separately, then it's harder for veclower to know which VEC_COND_EXPRs it needs to open-code. (OK, so that's a general problem between veclower and expand already, but it seems like the new approach could help to move away from that by doing the instruction selection directly in veclower.) > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal > + function based on type of selected expansion. */ > + > +static gimple * > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi) > +{ > + tree lhs, op0a = NULL_TREE, op0b = NULL_TREE; > + enum tree_code code; > + enum tree_code tcode; > + machine_mode cmp_op_mode; > + bool unsignedp; > + enum insn_code icode; > + imm_use_iterator imm_iter; > + > + /* Only consider code == GIMPLE_ASSIGN. */ > + gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi)); > + if (!stmt) > + return NULL; > + > + code = gimple_assign_rhs_code (stmt); > + if (code != VEC_COND_EXPR) > + return NULL; > + > + tree op0 = gimple_assign_rhs1 (stmt); > + tree op1 = gimple_assign_rhs2 (stmt); > + tree op2 = gimple_assign_rhs3 (stmt); > + lhs = gimple_assign_lhs (stmt); > + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); > + > + gcc_assert (!COMPARISON_CLASS_P (op0)); > + if (TREE_CODE (op0) == SSA_NAME) > + { > + unsigned int used_vec_cond_exprs = 0; > + gimple *use_stmt; > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0) > + { > + gassign *assign = dyn_cast<gassign *> (use_stmt); > + if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR > + && gimple_assign_rhs1 (assign) == op0) > + used_vec_cond_exprs++; > + } This looks like it's quadratic in the worst case. Could we check this in a different way? > + > + gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0)); > + if (def_stmt) > + { > + tcode = gimple_assign_rhs_code (def_stmt); > + op0a = gimple_assign_rhs1 (def_stmt); > + op0b = gimple_assign_rhs2 (def_stmt); > + > + tree op0a_type = TREE_TYPE (op0a); > + if (used_vec_cond_exprs >= 2 It would be good if targets were able to provide only vcond_mask. In that case I guess we should go this path if the later one would fail. > + && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type)) > + != CODE_FOR_nothing) > + && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode)) > + { > + /* Keep the SSA name and use vcond_mask. */ > + tcode = TREE_CODE (op0); > + } > + } > + else > + tcode = TREE_CODE (op0); > + } > + else > + tcode = TREE_CODE (op0); Might be easier to follow if tcode is TREE_CODE (op0) by default and only gets changed when we want to fold in the comparison. Thanks, Richard