2015-08-20 21:41 GMT+03:00 Jeff Law <[email protected]>:
> On 08/17/2015 10:22 AM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch starts a series introducing scalar masks support in the
>> vectorizer. It was discussed on the recent Cauldron and changes overiew is
>> available here:
>> https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Vectorization+for+Intel+AVX-512.pdf.
>> Here is shortly a list of changes introduced by this series:
>>
>> - Add new tree expr to produce scalar masks in a vectorized code
>> - Fix-up if-conversion to use bool predicates instead of integer masks
>> - Disable some bool patterns to avoid bool to int conversion where masks
>> can be used
>> - Support bool operands in vectorization factor computation
>> - Support scalar masks in MASK_LOAD, MASK_STORE and VEC_COND_EXPR by
>> adding new optabs
>> - Support vectorization for statements which are now not transformed by
>> bool patterns
>> - Add target support (hooks, optabs, expands)
>>
>> This patch introduces GEN_MASK_EXPR code. Intitially I wanted to use a
>> comparison as an operand for it directly mapping it into AVX-512 comparison
>> instruction. But a feedback was to simplify new code's semantics and use it
>> for converting vectors into scalar masks. Therefore if we want to compare
>> two vectors into a scalar masks we use two statements:
>>
>> vect.18_87 = vect__5.13_81 > vect__6.16_86;
>> mask__ifc__23.17_88 = GEN_MASK <vect.18_87>;
>>
>> Trying it in practice I found it producing worse code. The problem is that
>> on target first comparison is expanded into two instructions: cmp with mask
>> result + masked move to get a vector. GEN_MASK is then expanded into another
>> comparison with zero vector. Thus I get two comparisons + move instead of a
>> single comparison and have to optimize this out on a target side (current
>> optimizers can't handle it). That's actually what I wanted to avoid. For
>> now I changed GEN_MASK_EXPR to get a vector value as an operand but didn't
>> change expand pattern which has four opernads: two vectors to compare + cmp
>> operator + result. On expand I try to detect GEN_MASK uses a result of
>> comparison and thus avoid double comparison generation.
>>
>> Patch series is not actually fully finished yet. I still have several
>> type conversion tests not being vectorized and it wasn't widely tested.
>> That's what I'm working on now.
>>
>> Will be glad to any comments.
>>
>> Thanks,
>> Ilya
>> --
>> 2015-08-17 Ilya Enkovich <[email protected]>
>>
>> * expr.c (expand_expr_real_2): Support GEN_MASK_EXPR.
>> * gimple-pretty-print.c (dump_unary_rhs): Likewise.
>> * gimple.c (get_gimple_rhs_num_ops): Likewise.
>> * optabs.c: Include gimple.h.
>> (vector_compare_rtx): Add OPNO arg.
>> (get_gen_mask_icode): New.
>> (expand_gen_mask_expr_p): New.
>> (expand_gen_mask_expr): New.
>> (expand_vec_cond_expr): Adjust vector_compare_rtx call.
>> * optabs.def (gen_mask_optab): New.
>> (gen_masku_optab): New.
>> * optabs.h (expand_gen_mask_expr_p): New.
>> (expand_gen_mask_expr): New.
>> * tree-cfg.c (verify_gimple_assign_unary): Support GEN_MASK_EXPR.
>> * tree-inline.c (estimate_operator_cost): Likewise.
>> * tree-pretty-print.c (dump_generic_node): Likewise.
>> * tree-ssa-operands.c (get_expr_operands): Likewise.
>> * tree.def (GEN_MASK_EXPR): New.
>
> A general question, would any of this likely help Yuri's work to optimize
> MASK_STORES?
If I remember correctly his optimization is for Haswell which doesn't
have scalar masks. But doing it with scalar masks would be simpler
because zero mask check becomes trivial.
>
>>
>>
>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>> index a6ca706..bf466ca 100644
>> --- a/gcc/optabs.c
>> +++ b/gcc/optabs.c
>> @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see
>> #include "recog.h"
>> #include "reload.h"
>> #include "target.h"
>> +#include "gimple.h"
>
> Hmm, part of me doesn't want to see optabs.c depending on gimple.h.
>
> How painful would it be to have this stuff live in expr.c?
>
Should be possible to refactor it somehow.
>
>> +
>> +/* Generate insns for a GEN_MASK_EXPR, given its TYPE and operand. */
>> +
>> +rtx
>> +expand_gen_mask_expr (tree type, tree op0, rtx target)
>> +{
>> + struct expand_operand ops[4];
>> + enum insn_code icode;
>> + rtx comparison;
>> + machine_mode mode = TYPE_MODE (type);
>> + machine_mode cmp_op_mode;
>> + bool unsignedp;
>> + tree op0a, op0b;
>> + enum tree_code tcode;
>> + gimple def_stmt;
>> +
>> + /* Avoid double comparison. */
>> + if (TREE_CODE (op0) == SSA_NAME
>> + && (def_stmt = SSA_NAME_DEF_STMT (op0))
>> + && is_gimple_assign (def_stmt)
>> + && TREE_CODE_CLASS (gimple_assign_rhs_code (def_stmt)) ==
>> tcc_comparison)
>> + {
>> + op0a = gimple_assign_rhs1 (def_stmt);
>> + op0b = gimple_assign_rhs2 (def_stmt);
>> + tcode = gimple_assign_rhs_code (def_stmt);
>> + }
>> + else
>> + {
>> + op0a = op0;
>> + op0b = build_zero_cst (TREE_TYPE (op0));
>> + tcode = NE_EXPR;
>> + }
>> +
>> + unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
>> + cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
>> +
>> + gcc_assert (GET_MODE_BITSIZE (mode) >= GET_MODE_NUNITS (cmp_op_mode));
>> +
>> + icode = get_gen_mask_icode (cmp_op_mode, unsignedp);
>> + if (icode == CODE_FOR_nothing)
>> + return 0;
>
> So if the target doesn't have suitable insns, what happens? I suspect the
> answer is nothing useful. In which case the question becomes what prevents
> the optimizers from generating a GEN_MASK_EXPR? Maybe that'll become clear
> as I read the rest of the patches.
I don't expect other optimizer than vectorizer to generate it. And as
usually optab is checked before generating it.
>
>
>
>>
>> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
>> index e1ceea4..052c055 100644
>> --- a/gcc/tree-inline.c
>> +++ b/gcc/tree-inline.c
>> @@ -3850,6 +3850,7 @@ estimate_operator_cost (enum tree_code code,
>> eni_weights *weights,
>> case COMPLEX_EXPR:
>> case PAREN_EXPR:
>> case VIEW_CONVERT_EXPR:
>> + case GEN_MASK_EXPR:
>> return 0;
>
> That seems wrong to me :-)
Got it :)
Thanks,
Ilya
>
> Jeff