Martin Liška <mli...@suse.cz> writes:
> Hello.
>
> This is second attempt to get rid of tcc_comparison GENERIC trees
> to be used as the first argument of VEC_COND_EXPR.
>
> The patch attempts achieves that in the following steps:
> 1) veclower pass expands all tcc_comparison expression into a SSA_NAME
> 2) since that tcc_comparsion can't be used as the first argument of 
> VEC_COND_EXPR
>     (done in GIMPLE verifier)
> 3) I exposed new internal functions with:
> DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
> DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
> DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
> DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
>
> 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
> 5) the pass expands VEC_COND_EXPR into one of the internal functions defined 
> in 3)
> 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
>     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
> target and I don't see any reasonable change.
>
> Achieved benefits of the patch:
> - removal of a GENERIC expression being used in GIMPLE statements
> - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
> - possibility to expand smarter based on number of uses 
> (expand_vec_cmp_expr_p)
>
> Future plans:
> - tcc_comparison removal just during gimplification
> - removal of a code where these expressions are handled for VEC_COND_EXPR
> - do the similar thing for COND_EXPR?
>
> The task was guided by Richi (Biener) and I bet he can help with both further 
> questions
> and reasoning.

Thanks for doing this.  It definitely seems more friendly than the
four-operand version to targets where separate comparisons are the norm.

Just a couple of comments about the implementation:

> diff --git a/gcc/passes.def b/gcc/passes.def
> index 2bf2cb78fc5..d654e5ee9fe 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_cleanup_eh);
>    NEXT_PASS (pass_lower_resx);
>    NEXT_PASS (pass_nrv);
> +  NEXT_PASS (pass_gimple_isel);
>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
>    NEXT_PASS (pass_warn_function_noreturn);
>    NEXT_PASS (pass_gen_hsail);

What was the reason for making this a separate pass, rather than doing
it as part of veclower?  If we do them separately, then it's harder for
veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
that's a general problem between veclower and expand already, but it
seems like the new approach could help to move away from that by
doing the instruction selection directly in veclower.)

> +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
> +   function based on type of selected expansion.  */
> +
> +static gimple *
> +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
> +{
> +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
> +  enum tree_code code;
> +  enum tree_code tcode;
> +  machine_mode cmp_op_mode;
> +  bool unsignedp;
> +  enum insn_code icode;
> +  imm_use_iterator imm_iter;
> +
> +  /* Only consider code == GIMPLE_ASSIGN.  */
> +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
> +  if (!stmt)
> +    return NULL;
> +
> +  code = gimple_assign_rhs_code (stmt);
> +  if (code != VEC_COND_EXPR)
> +    return NULL;
> +
> +  tree op0 = gimple_assign_rhs1 (stmt);
> +  tree op1 = gimple_assign_rhs2 (stmt);
> +  tree op2 = gimple_assign_rhs3 (stmt);
> +  lhs = gimple_assign_lhs (stmt);
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +
> +  gcc_assert (!COMPARISON_CLASS_P (op0));
> +  if (TREE_CODE (op0) == SSA_NAME)
> +    {
> +      unsigned int used_vec_cond_exprs = 0;
> +      gimple *use_stmt;
> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
> +     {
> +       gassign *assign = dyn_cast<gassign *> (use_stmt);
> +       if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
> +           && gimple_assign_rhs1 (assign) == op0)
> +         used_vec_cond_exprs++;
> +     }

This looks like it's quadratic in the worst case.  Could we check
this in a different way?

> +
> +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
> +      if (def_stmt)
> +     {
> +       tcode = gimple_assign_rhs_code (def_stmt);
> +       op0a = gimple_assign_rhs1 (def_stmt);
> +       op0b = gimple_assign_rhs2 (def_stmt);
> +
> +       tree op0a_type = TREE_TYPE (op0a);
> +       if (used_vec_cond_exprs >= 2

It would be good if targets were able to provide only vcond_mask.
In that case I guess we should go this path if the later one would fail.

> +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
> +               != CODE_FOR_nothing)
> +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
> +         {
> +           /* Keep the SSA name and use vcond_mask.  */
> +           tcode = TREE_CODE (op0);
> +         }
> +     }
> +      else
> +     tcode = TREE_CODE (op0);
> +    }
> +  else
> +    tcode = TREE_CODE (op0);

Might be easier to follow if tcode is TREE_CODE (op0) by default and
only gets changed when we want to fold in the comparison.

Thanks,
Richard

Reply via email to