On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> Martin Liška <mli...@suse.cz> writes:
> > Hello.
> >
> > This is second attempt to get rid of tcc_comparison GENERIC trees
> > to be used as the first argument of VEC_COND_EXPR.
> >
> > The patch attempts achieves that in the following steps:
> > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME
> > 2) since that tcc_comparsion can't be used as the first argument of 
> > VEC_COND_EXPR
> >     (done in GIMPLE verifier)
> > 3) I exposed new internal functions with:
> > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
> >
> > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
> > 5) the pass expands VEC_COND_EXPR into one of the internal functions 
> > defined in 3)
> > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
> >     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and 
> > skylake
> > target and I don't see any reasonable change.
> >
> > Achieved benefits of the patch:
> > - removal of a GENERIC expression being used in GIMPLE statements
> > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
> > - possibility to expand smarter based on number of uses 
> > (expand_vec_cmp_expr_p)
> >
> > Future plans:
> > - tcc_comparison removal just during gimplification
> > - removal of a code where these expressions are handled for VEC_COND_EXPR
> > - do the similar thing for COND_EXPR?
> >
> > The task was guided by Richi (Biener) and I bet he can help with both 
> > further questions
> > and reasoning.
>
> Thanks for doing this.  It definitely seems more friendly than the
> four-operand version to targets where separate comparisons are the norm.
>
> Just a couple of comments about the implementation:
>
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 2bf2cb78fc5..d654e5ee9fe 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_cleanup_eh);
> >    NEXT_PASS (pass_lower_resx);
> >    NEXT_PASS (pass_nrv);
> > +  NEXT_PASS (pass_gimple_isel);
> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> >    NEXT_PASS (pass_warn_function_noreturn);
> >    NEXT_PASS (pass_gen_hsail);
>
> What was the reason for making this a separate pass, rather than doing
> it as part of veclower?  If we do them separately, then it's harder for
> veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
> that's a general problem between veclower and expand already, but it
> seems like the new approach could help to move away from that by
> doing the instruction selection directly in veclower.)

As the name of the pass suggests it was supposed to be the starting point
of doing all the "complex" (multi-GIMPLE-stmt matching) RTL expansion tricks.

But most importantly veclower is too early to catch CSE opportunities from
loop opts on the conditions and if veclower lowers things then we also want
CSE to cleanup its mess.  I guess we also do not want veclower to be done
before vectorization since it should be easier to re-vectorize from unsupported
vector code than from what veclower makes out of it ... catch-22.

So I consider pass placement a secondary issue for now.

> > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
> > +   function based on type of selected expansion.  */
> > +
> > +static gimple *
> > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
> > +{
> > +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
> > +  enum tree_code code;
> > +  enum tree_code tcode;
> > +  machine_mode cmp_op_mode;
> > +  bool unsignedp;
> > +  enum insn_code icode;
> > +  imm_use_iterator imm_iter;
> > +
> > +  /* Only consider code == GIMPLE_ASSIGN.  */
> > +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
> > +  if (!stmt)
> > +    return NULL;
> > +
> > +  code = gimple_assign_rhs_code (stmt);
> > +  if (code != VEC_COND_EXPR)
> > +    return NULL;
> > +
> > +  tree op0 = gimple_assign_rhs1 (stmt);
> > +  tree op1 = gimple_assign_rhs2 (stmt);
> > +  tree op2 = gimple_assign_rhs3 (stmt);
> > +  lhs = gimple_assign_lhs (stmt);
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> > +
> > +  gcc_assert (!COMPARISON_CLASS_P (op0));
> > +  if (TREE_CODE (op0) == SSA_NAME)
> > +    {
> > +      unsigned int used_vec_cond_exprs = 0;
> > +      gimple *use_stmt;
> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
> > +     {
> > +       gassign *assign = dyn_cast<gassign *> (use_stmt);
> > +       if (assign != NULL && gimple_assign_rhs_code (assign) == 
> > VEC_COND_EXPR
> > +           && gimple_assign_rhs1 (assign) == op0)
> > +         used_vec_cond_exprs++;
> > +     }
>
> This looks like it's quadratic in the worst case.  Could we check
> this in a different way?
>
> > +
> > +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> > +      if (def_stmt)
> > +     {
> > +       tcode = gimple_assign_rhs_code (def_stmt);
> > +       op0a = gimple_assign_rhs1 (def_stmt);
> > +       op0b = gimple_assign_rhs2 (def_stmt);
> > +
> > +       tree op0a_type = TREE_TYPE (op0a);
> > +       if (used_vec_cond_exprs >= 2
>
> It would be good if targets were able to provide only vcond_mask.
> In that case I guess we should go this path if the later one would fail.
>
> > +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
> > +               != CODE_FOR_nothing)
> > +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
> > +         {
> > +           /* Keep the SSA name and use vcond_mask.  */
> > +           tcode = TREE_CODE (op0);
> > +         }
> > +     }
> > +      else
> > +     tcode = TREE_CODE (op0);
> > +    }
> > +  else
> > +    tcode = TREE_CODE (op0);
>
> Might be easier to follow if tcode is TREE_CODE (op0) by default and
> only gets changed when we want to fold in the comparison.
>
> Thanks,
> Richard

Reply via email to