On 08 Sep 15:37, Ilya Enkovich wrote:
> 2015-09-04 23:42 GMT+03:00 Jeff Law <[email protected]>:
> >
> > So do we have enough confidence in this representation that we want to go
> > ahead and commit to it?
>
> I think new representation fits nice mostly. There are some places
> where I have to make some exceptions for vector of bools to make it
> work. This is mostly to avoid target modifications. I'd like to avoid
> necessity to change all targets currently supporting vec_cond. It
> makes me add some special handling of vec<bool> in GIMPLE, e.g. I add
> a special code in vect_init_vector to build vec<bool> invariants with
> proper casting to int. Otherwise I'd need to do it on a target side.
>
> I made several fixes and current patch (still allowing integer vector
> result for vector comparison and applying bool patterns) passes
> bootstrap and regression testing on x86_64. Now I'll try to fully
> switch to vec<bool> and see how it goes.
>
> Thanks,
> Ilya
>
Hi,
I made a step forward forcing vector comparisons have a mask (vec<bool>) result
and disabling bool patterns in case vector comparison is supported by target.
Several issues were met.
- c/c++ front-ends generate vector comparison with integer vector result. I
had to make some modifications to use vec_cond instead. Don't know if there
are other front-ends producing vector comparisons.
- vector lowering fails to expand vector masks due to mismatch of type and
mode sizes. I fixed vector type size computation to match mode size and added
a special handling of mask expand.
- I disabled canonical type creation for vector mask because we can't layout
it with VOID mode. I don't know why we may need a canonical type here. But
get_mask_mode call may be moved into type layout to get it.
- Expand of vec<bool> constants/contstructors requires special handling.
Common case should require target hooks/optabs to expand vector into required
mode. But I suppose we want to have a generic code to handle vector of int
mode case to avoid modification of multiple targets which use default vec<bool>
modes.
Currently 'make check' shows two types of regression.
- missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND).
This must be due to my front-end changes. Hope it will be easy to fix.
- missed vectorization. All of them appear due to bool patterns disabling. I
didn't look into all of them but it seems the main problem is in mixed type
sizes. With bool patterns and integer vector masks we just put int->(other
sized int) conversion for masks and it gives us required mask transformation.
With boolean mask we don't have a proper scalar statements to do that. I think
mask widening/narrowing may be directly supported in masked statements
vectorization. Going to look into it.
I attach what I currently have for a prototype. It grows bigger so I split
into several parts.
Thanks,
Ilya
--
* avx512-vec-bool-01-add-truth-vector.ChangeLog
2015-09-15 Ilya Enkovich <[email protected]>
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_VECTORIZE_GET_MASK_MODE): New.
* stor-layout.c (layout_type): Use mode to get vector mask size.
(vector_type_mode): Likewise.
* target.def (get_mask_mode): New.
* targhooks.c (default_vector_alignment): Use mode alignment
for vector masks.
(default_get_mask_mode): New.
* targhooks.h (default_get_mask_mode): New.
* tree.c (make_vector_type): Vector mask has no canonical type.
(build_truth_vector_type): New.
(build_same_sized_truth_vector_type): New.
(truth_type_for): Support vector masks.
* tree.h (VECTOR_MASK_TYPE_P): New.
(build_truth_vector_type): New.
(build_same_sized_truth_vector_type): New.
* avx512-vec-bool-02-no-int-vec-cmp.ChangeLog
gcc/
2015-09-15 Ilya Enkovich <[email protected]>
* tree-cfg.c (verify_gimple_comparison) Require vector mask
type for vector comparison.
(verify_gimple_assign_ternary): Likewise.
gcc/c
2015-09-15 Ilya Enkovich <[email protected]>
* c-typeck.c (build_conditional_expr): Use vector mask
type for vector comparison.
(build_vec_cmp): New.
(build_binary_op): Use build_vec_cmp for comparison.
gcc/cp
2015-09-15 Ilya Enkovich <[email protected]>
* call.c (build_conditional_expr_1): Use vector mask
type for vector comparison.
* typeck.c (build_vec_cmp): New.
(cp_build_binary_op): Use build_vec_cmp for comparison.
* avx512-vec-bool-03-vec-lower.ChangeLog
2015-09-15 Ilya Enkovich <[email protected]>
* tree-vect-generic.c (tree_vec_extract): Use additional
comparison when extracting boolean value.
(do_bool_compare): New.
(expand_vector_comparison): Add casts for vector mask.
(expand_vector_divmod): Use vector mask type for vector
comparison.
(expand_vector_operations_1) Skip scalar mode mask statements.
* avx512-vec-bool-04-vectorize.ChangeLog
gcc/
2015-09-15 Ilya Enkovich <[email protected]>
* expr.c (do_store_flag): Use expand_vec_cmp_expr for mask results.
(const_vector_mask_from_tree): New.
(const_vector_from_tree): Use const_vector_mask_from_tree for vector
masks.
* internal-fn.c (expand_MASK_LOAD): Adjust to optab changes.
(expand_MASK_STORE): Likewise.
* optabs.c (vector_compare_rtx): Add OPNO arg.
(expand_vec_cond_expr): Adjust to vector_compare_rtx change.
(get_vec_cmp_icode): New.
(expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* optabs.def (vec_cmp_optab): New.
(vec_cmpu_optab): New.
(maskload_optab): Transform into convert optab.
(maskstore_optab): Likewise.
* optabs.h (expand_vec_cmp_expr_p): New.
(expand_vec_cmp_expr): New.
(can_vec_mask_load_store_p): Add MASK_MODE arg.
* tree-if-conv.c (ifcvt_can_use_mask_load_store): Adjust to
can_vec_mask_load_store_p signature change.
(predicate_mem_writes): Use boolean mask.
* tree-vect-data-refs.c (vect_get_new_vect_var): Support vect_mask_var.
(vect_create_destination_var): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Ignore mask
operations for VF. Add mask type computation.
* tree-vect-stmts.c (vect_init_vector): Support mask invariants.
(vect_get_vec_def_for_operand): Support mask constant.
(vectorizable_mask_load_store): Adjust to can_vec_mask_load_store_p
signature change.
(vectorizable_condition): Use vector mask type for vector comparison.
(vectorizable_comparison): New.
(vect_analyze_stmt): Add vectorizable_comparison.
(vect_transform_stmt): Likewise.
(get_mask_type_for_scalar_type): New.
* tree-vectorizer.h (enum stmt_vec_info_type): Add vect_mask_var
(enum stmt_vec_info_type): Add comparison_vec_info_type.
(get_mask_type_for_scalar_type): New.
* avx512-vec-bool-05-bool-patterns.ChangeLog
2015-09-15 Ilya Enkovich <[email protected]>
* tree-vect-patterns.c (check_bool_pattern): Check fails
if we can vectorize comparison directly.
(search_type_for_mask): New.
(vect_recog_bool_pattern): Support cases when bool pattern
check fails.
* avx512-vec-bool-06-i386.ChangeLog
2015-09-15 Ilya Enkovich <[email protected]>
* config/i386/i386-protos.h (ix86_expand_mask_vec_cmp): New.
(ix86_expand_int_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
* config/i386/i386.c (ix86_expand_sse_cmp): Allow NULL for
op_true and op_false.
(ix86_int_cmp_code_to_pcmp_immediate): New.
(ix86_fp_cmp_code_to_pcmp_immediate): New.
(ix86_cmp_code_to_pcmp_immediate): New.
(ix86_expand_mask_vec_cmp): New.
(ix86_expand_fp_vec_cmp): New.
(ix86_expand_int_sse_cmp): New.
(ix86_expand_int_vcond): Use ix86_expand_int_sse_cmp.
(ix86_expand_int_vec_cmp): New.
(ix86_get_mask_mode): New.
(TARGET_VECTORIZE_GET_MASK_MODE): New.
* config/i386/sse.md (avx512fmaskmodelower): New.
(vec_cmp<mode><avx512fmaskmodelower>): New.
(vec_cmp<mode><sseintvecmodelower>): New.
(vec_cmpv2div2di): New.
(vec_cmpu<mode><avx512fmaskmodelower>): New.
(vec_cmpu<mode><sseintvecmodelower>): New.
(vec_cmpuv2div2di): New.
(maskload<mode>): Rename to ...
(maskload<mode><sseintvecmodelower>): ... this.
(maskstore<mode>): Rename to ...
(maskstore<mode><sseintvecmodelower>): ... this.
(maskload<mode><avx512fmaskmodelower>): New.
(maskstore<mode><avx512fmaskmodelower>): New.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f5a1f84..acdfcd5 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5688,6 +5688,11 @@ mode returned by
@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
The default is zero which means to not iterate over other vector sizes.
@end deftypefn
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned
@var{nunits}, unsigned @var{length})
+This hook returns mode to be used for a mask to be used for a vector
+of specified @var{length} with @var{nunits} elements.
+@end deftypefn
+
@deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop
*@var{loop_info})
This hook should initialize target-specific data structures in preparation for
modeling the costs of vectorizing a loop or basic block. The default allocates
three unsigned integers for accumulating costs for the prologue, body, and
epilogue of the loop or basic block. If @var{loop_info} is non-NULL, it
identifies the loop being vectorized; otherwise a single block is being
vectorized.
@end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 9d5ac0a..52e912a 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4225,6 +4225,8 @@ address; but often a machine-dependent strategy can
generate better code.
@hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
+@hook TARGET_VECTORIZE_GET_MASK_MODE
+
@hook TARGET_VECTORIZE_INIT_COST
@hook TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c
index 938e54b..f24a0c4 100644
--- a/gcc/stor-layout.c
+++ b/gcc/stor-layout.c
@@ -2184,11 +2184,22 @@ layout_type (tree type)
TYPE_SATURATING (type) = TYPE_SATURATING (TREE_TYPE (type));
TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type));
- TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
- TYPE_SIZE_UNIT (innertype),
- size_int (nunits));
- TYPE_SIZE (type) = int_const_binop (MULT_EXPR, TYPE_SIZE (innertype),
- bitsize_int (nunits));
+ if (VECTOR_MASK_TYPE_P (type))
+ {
+ TYPE_SIZE_UNIT (type)
+ = size_int (GET_MODE_SIZE (type->type_common.mode));
+ TYPE_SIZE (type)
+ = bitsize_int (GET_MODE_BITSIZE (type->type_common.mode));
+ }
+ else
+ {
+ TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
+ TYPE_SIZE_UNIT (innertype),
+ size_int (nunits));
+ TYPE_SIZE (type) = int_const_binop (MULT_EXPR,
+ TYPE_SIZE (innertype),
+ bitsize_int (nunits));
+ }
/* For vector types, we do not default to the mode's alignment.
Instead, query a target hook, defaulting to natural alignment.
@@ -2455,7 +2466,14 @@ vector_type_mode (const_tree t)
machine_mode innermode = TREE_TYPE (t)->type_common.mode;
/* For integers, try mapping it to a same-sized scalar mode. */
- if (GET_MODE_CLASS (innermode) == MODE_INT)
+ if (VECTOR_MASK_TYPE_P (t))
+ {
+ mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
+
+ if (mode != VOIDmode && have_regs_of_mode[mode])
+ return mode;
+ }
+ else if (GET_MODE_CLASS (innermode) == MODE_INT)
{
mode = mode_for_size (TYPE_VECTOR_SUBPARTS (t)
* GET_MODE_BITSIZE (innermode), MODE_INT, 0);
diff --git a/gcc/target.def b/gcc/target.def
index 4edc209..c5b8ed9 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1789,6 +1789,15 @@ The default is zero which means to not iterate over
other vector sizes.",
(void),
default_autovectorize_vector_sizes)
+/* Function to get a target mode for a vector mask. */
+DEFHOOK
+(get_mask_mode,
+ "This hook returns mode to be used for a mask to be used for a vector\n\
+of specified @var{length} with @var{nunits} elements.",
+ machine_mode,
+ (unsigned nunits, unsigned length),
+ default_get_mask_mode)
+
/* Target builtin that implements vector gather operation. */
DEFHOOK
(builtin_gather,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 7238c8f..ac01d57 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1087,6 +1087,20 @@ default_autovectorize_vector_sizes (void)
return 0;
}
+/* By defaults a vector of integers is used as a mask. */
+
+machine_mode
+default_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+ unsigned elem_size = vector_size / nunits;
+ machine_mode elem_mode
+ = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+ gcc_assert (elem_size * nunits == vector_size);
+
+ return mode_for_vector (elem_mode, nunits);
+}
+
/* By default, the cost model accumulates three separate costs (prologue,
loop body, and epilogue) for a vectorized loop or block. So allocate an
array of three unsigned ints, set it to zero, and return its address. */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 5ae991d..cc7263f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -100,6 +100,7 @@ default_builtin_support_vector_misalignment (machine_mode
mode,
int, bool);
extern machine_mode default_preferred_simd_mode (machine_mode mode);
extern unsigned int default_autovectorize_vector_sizes (void);
+extern machine_mode default_get_mask_mode (unsigned, unsigned);
extern void *default_init_cost (struct loop *);
extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
struct _stmt_vec_info *, int,
diff --git a/gcc/tree.c b/gcc/tree.c
index af3a6a3..946d2ad 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -9742,8 +9742,9 @@ make_vector_type (tree innertype, int nunits,
machine_mode mode)
if (TYPE_STRUCTURAL_EQUALITY_P (innertype))
SET_TYPE_STRUCTURAL_EQUALITY (t);
- else if (TYPE_CANONICAL (innertype) != innertype
- || mode != VOIDmode)
+ else if ((TYPE_CANONICAL (innertype) != innertype
+ || mode != VOIDmode)
+ && !VECTOR_MASK_TYPE_P (t))
TYPE_CANONICAL (t)
= make_vector_type (TYPE_CANONICAL (innertype), nunits, VOIDmode);
@@ -10568,6 +10569,36 @@ build_vector_type (tree innertype, int nunits)
return make_vector_type (innertype, nunits, VOIDmode);
}
+/* Build truth vector with specified length and number of units. */
+
+tree
+build_truth_vector_type (unsigned nunits, unsigned vector_size)
+{
+ machine_mode mask_mode = targetm.vectorize.get_mask_mode (nunits,
+ vector_size);
+
+ if (mask_mode == VOIDmode)
+ return NULL;
+
+ return make_vector_type (boolean_type_node, nunits, mask_mode);
+}
+
+/* Returns a vector type corresponding to a comparison of VECTYPE. */
+
+tree
+build_same_sized_truth_vector_type (tree vectype)
+{
+ if (VECTOR_MASK_TYPE_P (vectype))
+ return vectype;
+
+ unsigned HOST_WIDE_INT size = GET_MODE_SIZE (TYPE_MODE (vectype));
+
+ if (!size)
+ size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
+
+ return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size);
+}
+
/* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set. */
tree
@@ -11054,9 +11085,10 @@ truth_type_for (tree type)
{
if (TREE_CODE (type) == VECTOR_TYPE)
{
- tree elem = lang_hooks.types.type_for_size
- (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type))), 0);
- return build_opaque_vector_type (elem, TYPE_VECTOR_SUBPARTS (type));
+ if (VECTOR_MASK_TYPE_P (type))
+ return type;
+ return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (type),
+ GET_MODE_SIZE (TYPE_MODE (type)));
}
else
return boolean_type_node;
diff --git a/gcc/tree.h b/gcc/tree.h
index 2cd6ec4..09fb26d 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -469,6 +469,12 @@ extern void omp_clause_range_check_failed (const_tree,
const char *, int,
#define VECTOR_TYPE_P(TYPE) (TREE_CODE (TYPE) == VECTOR_TYPE)
+/* Nonzero if TYPE represents a vector of booleans. */
+
+#define VECTOR_MASK_TYPE_P(TYPE) \
+ (TREE_CODE (TYPE) == VECTOR_TYPE \
+ && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE)
+
/* Nonzero if TYPE represents an integral type. Note that we do not
include COMPLEX types here. Keep these checks in ascending code
order. */
@@ -3820,6 +3826,8 @@ extern tree build_reference_type_for_mode (tree,
machine_mode, bool);
extern tree build_reference_type (tree);
extern tree build_vector_type_for_mode (tree, machine_mode);
extern tree build_vector_type (tree innertype, int nunits);
+extern tree build_truth_vector_type (unsigned, unsigned);
+extern tree build_same_sized_truth_vector_type (tree vectype);
extern tree build_opaque_vector_type (tree innertype, int nunits);
extern tree build_index_type (tree);
extern tree build_array_type (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index e8c8189..6ea4f19 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -4753,6 +4753,18 @@ build_conditional_expr (location_t colon_loc, tree
ifexp, bool ifexp_bcp,
&& TREE_CODE (orig_op2) == INTEGER_CST
&& !TREE_OVERFLOW (orig_op2)));
}
+
+ /* Need to convert condition operand into a vector mask. */
+ if (VECTOR_TYPE_P (TREE_TYPE (ifexp)))
+ {
+ tree vectype = TREE_TYPE (ifexp);
+ tree elem_type = TREE_TYPE (vectype);
+ tree zero = build_int_cst (elem_type, 0);
+ tree zero_vec = build_vector_from_val (vectype, zero);
+ tree cmp_type = build_same_sized_truth_vector_type (vectype);
+ ifexp = build2 (NE_EXPR, cmp_type, ifexp, zero_vec);
+ }
+
if (int_const || (ifexp_bcp && TREE_CODE (ifexp) == INTEGER_CST))
ret = fold_build3_loc (colon_loc, COND_EXPR, result_type, ifexp, op1, op2);
else
@@ -10195,6 +10207,19 @@ push_cleanup (tree decl, tree cleanup, bool eh_only)
STATEMENT_LIST_STMT_EXPR (list) = stmt_expr;
}
+/* Build a vector comparison using VEC_COND_EXPR. */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+ tree arg0, tree arg1)
+{
+ tree zero_vec = build_zero_cst (type);
+ tree minus_one_vec = build_minus_one_cst (type);
+ tree cmp_type = build_same_sized_truth_vector_type (type);
+ tree cmp = build2 (code, cmp_type, arg0, arg1);
+ return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
+
/* Build a binary-operation expression without default conversions.
CODE is the kind of expression to build.
LOCATION is the operator's location.
@@ -10753,7 +10778,8 @@ build_binary_op (location_t location, enum tree_code
code,
result_type = build_opaque_vector_type (intt,
TYPE_VECTOR_SUBPARTS (type0));
converted = 1;
- break;
+ ret = build_vec_cmp (resultcode, result_type, op0, op1);
+ goto return_build_binary_op;
}
if (FLOAT_TYPE_P (type0) || FLOAT_TYPE_P (type1))
warning_at (location,
@@ -10895,7 +10921,8 @@ build_binary_op (location_t location, enum tree_code
code,
result_type = build_opaque_vector_type (intt,
TYPE_VECTOR_SUBPARTS (type0));
converted = 1;
- break;
+ ret = build_vec_cmp (resultcode, result_type, op0, op1);
+ goto return_build_binary_op;
}
build_type = integer_type_node;
if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 8d4a9e2..7f16e84 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4727,8 +4727,10 @@ build_conditional_expr_1 (location_t loc, tree arg1,
tree arg2, tree arg3,
}
if (!COMPARISON_CLASS_P (arg1))
- arg1 = cp_build_binary_op (loc, NE_EXPR, arg1,
- build_zero_cst (arg1_type), complain);
+ {
+ tree cmp_type = build_same_sized_truth_vector_type (arg1_type);
+ arg1 = build2 (NE_EXPR, cmp_type, arg1, build_zero_cst (arg1_type));
+ }
return fold_build3 (VEC_COND_EXPR, arg2_type, arg1, arg2, arg3);
}
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 83fd34c..89bacc2 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -3898,6 +3898,18 @@ build_binary_op (location_t location, enum tree_code
code, tree op0, tree op1,
return cp_build_binary_op (location, code, op0, op1, tf_warning_or_error);
}
+/* Build a vector comparison using VEC_COND_EXPR. */
+
+static tree
+build_vec_cmp (tree_code code, tree type,
+ tree arg0, tree arg1)
+{
+ tree zero_vec = build_zero_cst (type);
+ tree minus_one_vec = build_minus_one_cst (type);
+ tree cmp_type = build_same_sized_truth_vector_type(type);
+ tree cmp = build2 (code, cmp_type, arg0, arg1);
+ return build3 (VEC_COND_EXPR, type, cmp, minus_one_vec, zero_vec);
+}
/* Build a binary-operation expression without default conversions.
CODE is the kind of expression to build.
@@ -4757,7 +4769,7 @@ cp_build_binary_op (location_t location,
result_type = build_opaque_vector_type (intt,
TYPE_VECTOR_SUBPARTS (type0));
converted = 1;
- break;
+ return build_vec_cmp (resultcode, result_type, op0, op1);
}
build_type = boolean_type_node;
if ((code0 == INTEGER_TYPE || code0 == REAL_TYPE
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 5ac73b3..2ce5a84 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
return true;
}
}
- /* Or an integer vector type with the same size and element count
+ /* Or a boolean vector type with the same element count
as the comparison operand types. */
else if (TREE_CODE (type) == VECTOR_TYPE
- && TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE)
+ && TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
{
if (TREE_CODE (op0_type) != VECTOR_TYPE
|| TREE_CODE (op1_type) != VECTOR_TYPE)
@@ -3478,12 +3478,7 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
return true;
}
- if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type)
- || (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (type)))
- != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type))))
- /* The result of a vector comparison is of signed
- integral type. */
- || TYPE_UNSIGNED (TREE_TYPE (type)))
+ if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
{
error ("invalid vector comparison resulting type");
debug_generic_expr (type);
@@ -3970,15 +3965,13 @@ verify_gimple_assign_ternary (gassign *stmt)
break;
case VEC_COND_EXPR:
- if (!VECTOR_INTEGER_TYPE_P (rhs1_type)
- || TYPE_SIGN (rhs1_type) != SIGNED
- || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type)
+ if (!VECTOR_MASK_TYPE_P (rhs1_type)
|| TYPE_VECTOR_SUBPARTS (rhs1_type)
!= TYPE_VECTOR_SUBPARTS (lhs_type))
{
- error ("the first argument of a VEC_COND_EXPR must be of a signed "
- "integral vector type of the same size and number of "
- "elements as the result");
+ error ("the first argument of a VEC_COND_EXPR must be of a "
+ "boolean vector type of the same number of elements "
+ "as the result");
debug_generic_expr (lhs_type);
debug_generic_expr (rhs1_type);
return true;
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index be3d27f..a89b08c 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type,
tree t, tree bitsize, tree bitpos)
{
if (bitpos)
- return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+ {
+ if (TREE_CODE (type) == BOOLEAN_TYPE)
+ {
+ tree itype
+ = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0);
+ tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t,
+ bitsize, bitpos);
+ return gimplify_build2 (gsi, NE_EXPR, type, field,
+ build_zero_cst (itype));
+ }
+ else
+ return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos);
+ }
else
return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t);
}
@@ -171,6 +183,21 @@ do_compare (gimple_stmt_iterator *gsi, tree inner_type,
tree a, tree b,
build_int_cst (comp_type, 0));
}
+/* Construct expression (A[BITPOS] code B[BITPOS])
+
+ INNER_TYPE is the type of A and B elements
+
+ returned expression is of boolean type. */
+static tree
+do_bool_compare (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
+ tree bitpos, tree bitsize, enum tree_code code)
+{
+ a = tree_vec_extract (gsi, inner_type, a, bitsize, bitpos);
+ b = tree_vec_extract (gsi, inner_type, b, bitsize, bitpos);
+
+ return gimplify_build2 (gsi, code, boolean_type_node, a, b);
+}
+
/* Expand vector addition to scalars. This does bit twiddling
in order to increase parallelism:
@@ -350,9 +377,31 @@ expand_vector_comparison (gimple_stmt_iterator *gsi, tree
type, tree op0,
tree op1, enum tree_code code)
{
tree t;
- if (! expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
- t = expand_vector_piecewise (gsi, do_compare, type,
- TREE_TYPE (TREE_TYPE (op0)), op0, op1, code);
+ if (!expand_vec_cmp_expr_p (TREE_TYPE (op0), type)
+ && !expand_vec_cond_expr_p (type, TREE_TYPE (op0)))
+ {
+ if (VECTOR_MODE_P (TYPE_MODE (type)))
+ {
+ tree inner_type = TREE_TYPE (TREE_TYPE (op0));
+ tree elem_type = build_nonstandard_integer_type
+ (GET_MODE_BITSIZE (TYPE_MODE (inner_type)), 0);
+ tree int_vec_type = build_vector_type (elem_type,
+ TYPE_VECTOR_SUBPARTS (type));
+ tree vec = expand_vector_piecewise (gsi, do_compare, int_vec_type,
+ TREE_TYPE (TREE_TYPE (op0)),
+ op0, op1, code);
+ gimple stmt;
+
+ return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, vec);
+ t = make_ssa_name (type);
+ stmt = gimple_build_assign (t, build1 (VIEW_CONVERT_EXPR, type, vec));
+ gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+ }
+ else
+ t = expand_vector_piecewise (gsi, do_bool_compare, type,
+ TREE_TYPE (TREE_TYPE (op0)),
+ op0, op1, code);
+ }
else
t = NULL_TREE;
@@ -625,11 +674,12 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree
type, tree op0,
if (addend == NULL_TREE
&& expand_vec_cond_expr_p (type, type))
{
- tree zero, cst, cond;
+ tree zero, cst, cond, mask_type;
gimple stmt;
+ mask_type = build_same_sized_truth_vector_type (type);
zero = build_zero_cst (type);
- cond = build2 (LT_EXPR, type, op0, zero);
+ cond = build2 (LT_EXPR, mask_type, op0, zero);
for (i = 0; i < nunits; i++)
vec[i] = build_int_cst (TREE_TYPE (type),
((unsigned HOST_WIDE_INT) 1
@@ -1506,6 +1556,12 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi)
if (TREE_CODE (type) != VECTOR_TYPE)
return;
+ /* A scalar operation pretending to be a vector one. */
+ if (VECTOR_MASK_TYPE_P (type)
+ && !VECTOR_MODE_P (TYPE_MODE (type))
+ && TYPE_MODE (type) != BLKmode)
+ return;
+
if (CONVERT_EXPR_CODE_P (code)
|| code == FLOAT_EXPR
|| code == FIX_TRUNC_EXPR
diff --git a/gcc/expr.c b/gcc/expr.c
index 1e820b4..6ae0c4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11000,9 +11000,15 @@ do_store_flag (sepops ops, rtx target, machine_mode
mode)
if (TREE_CODE (ops->type) == VECTOR_TYPE)
{
tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
- tree if_true = constant_boolean_node (true, ops->type);
- tree if_false = constant_boolean_node (false, ops->type);
- return expand_vec_cond_expr (ops->type, ifexp, if_true, if_false,
target);
+ if (VECTOR_MASK_TYPE_P (ops->type))
+ return expand_vec_cmp_expr (ops->type, ifexp, target);
+ else
+ {
+ tree if_true = constant_boolean_node (true, ops->type);
+ tree if_false = constant_boolean_node (false, ops->type);
+ return expand_vec_cond_expr (ops->type, ifexp, if_true,
+ if_false, target);
+ }
}
/* Get the rtx comparison code to use. We know that EXP is a comparison
@@ -11289,6 +11295,39 @@ try_tablejump (tree index_type, tree index_expr, tree
minval, tree range,
return 1;
}
+/* Return a CONST_VECTOR rtx representing vector mask for
+ a VECTOR_CST of booleans. */
+static rtx
+const_vector_mask_from_tree (tree exp)
+{
+ rtvec v;
+ unsigned i;
+ int units;
+ tree elt;
+ machine_mode inner, mode;
+
+ mode = TYPE_MODE (TREE_TYPE (exp));
+ units = GET_MODE_NUNITS (mode);
+ inner = GET_MODE_INNER (mode);
+
+ v = rtvec_alloc (units);
+
+ for (i = 0; i < VECTOR_CST_NELTS (exp); ++i)
+ {
+ elt = VECTOR_CST_ELT (exp, i);
+
+ gcc_assert (TREE_CODE (elt) == INTEGER_CST);
+ if (integer_zerop (elt))
+ RTVEC_ELT (v, i) = CONST0_RTX (inner);
+ else if (integer_onep (elt))
+ RTVEC_ELT (v, i) = CONSTM1_RTX (inner);
+ else
+ gcc_unreachable ();
+ }
+
+ return gen_rtx_CONST_VECTOR (mode, v);
+}
+
/* Return a CONST_VECTOR rtx for a VECTOR_CST tree. */
static rtx
const_vector_from_tree (tree exp)
@@ -11304,6 +11343,9 @@ const_vector_from_tree (tree exp)
if (initializer_zerop (exp))
return CONST0_RTX (mode);
+ if (VECTOR_MASK_TYPE_P (TREE_TYPE (exp)))
+ return const_vector_mask_from_tree (exp);
+
units = GET_MODE_NUNITS (mode);
inner = GET_MODE_INNER (mode);
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index e785946..4ca0a40 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt)
create_output_operand (&ops[0], target, TYPE_MODE (type));
create_fixed_operand (&ops[1], mem);
create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
- expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops);
+ expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type),
+ TYPE_MODE (TREE_TYPE (maskt))),
+ 3, ops);
}
static void
@@ -1908,7 +1910,9 @@ expand_MASK_STORE (gcall *stmt)
create_fixed_operand (&ops[0], mem);
create_input_operand (&ops[1], reg, TYPE_MODE (type));
create_input_operand (&ops[2], mask, TYPE_MODE (TREE_TYPE (maskt)));
- expand_insn (optab_handler (maskstore_optab, TYPE_MODE (type)), 3, ops);
+ expand_insn (convert_optab_handler (maskstore_optab, TYPE_MODE (type),
+ TYPE_MODE (TREE_TYPE (maskt))),
+ 3, ops);
}
static void
diff --git a/gcc/optabs.c b/gcc/optabs.c
index e533e6e..fd9932f 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6490,11 +6490,13 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
}
/* Return comparison rtx for COND. Use UNSIGNEDP to select signed or
- unsigned operators. Do not generate compare instruction. */
+ unsigned operators. OPNO holds an index of the first comparison
+ operand in insn with code ICODE. Do not generate compare instruction. */
static rtx
vector_compare_rtx (enum tree_code tcode, tree t_op0, tree t_op1,
- bool unsignedp, enum insn_code icode)
+ bool unsignedp, enum insn_code icode,
+ unsigned int opno)
{
struct expand_operand ops[2];
rtx rtx_op0, rtx_op1;
@@ -6520,7 +6522,7 @@ vector_compare_rtx (enum tree_code tcode, tree t_op0,
tree t_op1,
create_input_operand (&ops[0], rtx_op0, m0);
create_input_operand (&ops[1], rtx_op1, m1);
- if (!maybe_legitimize_operands (icode, 4, 2, ops))
+ if (!maybe_legitimize_operands (icode, opno, 2, ops))
gcc_unreachable ();
return gen_rtx_fmt_ee (rcode, VOIDmode, ops[0].value, ops[1].value);
}
@@ -6843,16 +6845,25 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0,
tree op1, tree op2,
op0a = TREE_OPERAND (op0, 0);
op0b = TREE_OPERAND (op0, 1);
tcode = TREE_CODE (op0);
+ unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
}
else
{
+ gcc_assert (VECTOR_MASK_TYPE_P (TREE_TYPE (op0)));
+ if (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE ((op0)))) != MODE_VECTOR_INT)
+ {
+ /* This is a vcond with mask. To be supported soon... */
+ gcc_unreachable ();
+ }
/* Fake op0 < 0. */
- gcc_assert (!TYPE_UNSIGNED (TREE_TYPE (op0)));
- op0a = op0;
- op0b = build_zero_cst (TREE_TYPE (op0));
- tcode = LT_EXPR;
+ else
+ {
+ op0a = op0;
+ op0b = build_zero_cst (TREE_TYPE (op0));
+ tcode = LT_EXPR;
+ unsignedp = false;
+ }
}
- unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
@@ -6863,7 +6874,7 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree
op1, tree op2,
if (icode == CODE_FOR_nothing)
return 0;
- comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode);
+ comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 4);
rtx_op1 = expand_normal (op1);
rtx_op2 = expand_normal (op2);
@@ -6877,6 +6888,63 @@ expand_vec_cond_expr (tree vec_cond_type, tree op0, tree
op1, tree op2,
return ops[0].value;
}
+/* Return insn code for a comparison operator with VMODE
+ resultin MASK_MODE, unsigned if UNS is true. */
+
+static inline enum insn_code
+get_vec_cmp_icode (machine_mode vmode, machine_mode mask_mode, bool uns)
+{
+ optab tab = uns ? vec_cmpu_optab : vec_cmp_optab;
+ return convert_optab_handler (tab, vmode, mask_mode);
+}
+
+/* Return TRUE if appropriate vector insn is available
+ for vector comparison expr with vector type VALUE_TYPE
+ and resulting mask with MASK_TYPE. */
+
+bool
+expand_vec_cmp_expr_p (tree value_type, tree mask_type)
+{
+ enum insn_code icode = get_vec_cmp_icode (TYPE_MODE (value_type),
+ TYPE_MODE (mask_type),
+ TYPE_UNSIGNED (value_type));
+ return (icode != CODE_FOR_nothing);
+}
+
+/* Generate insns for a vector comparison into a mask. */
+
+rtx
+expand_vec_cmp_expr (tree type, tree exp, rtx target)
+{
+ struct expand_operand ops[4];
+ enum insn_code icode;
+ rtx comparison;
+ machine_mode mask_mode = TYPE_MODE (type);
+ machine_mode vmode;
+ bool unsignedp;
+ tree op0a, op0b;
+ enum tree_code tcode;
+
+ op0a = TREE_OPERAND (exp, 0);
+ op0b = TREE_OPERAND (exp, 1);
+ tcode = TREE_CODE (exp);
+
+ unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+ vmode = TYPE_MODE (TREE_TYPE (op0a));
+
+ icode = get_vec_cmp_icode (vmode, mask_mode, unsignedp);
+ if (icode == CODE_FOR_nothing)
+ return 0;
+
+ comparison = vector_compare_rtx (tcode, op0a, op0b, unsignedp, icode, 2);
+ create_output_operand (&ops[0], target, mask_mode);
+ create_fixed_operand (&ops[1], comparison);
+ create_fixed_operand (&ops[2], XEXP (comparison, 0));
+ create_fixed_operand (&ops[3], XEXP (comparison, 1));
+ expand_insn (icode, 4, ops);
+ return ops[0].value;
+}
+
/* Return non-zero if a highpart multiply is supported of can be synthisized.
For the benefit of expand_mult_highpart, the return value is 1 for direct,
2 for even/odd widening, and 3 for hi/lo widening. */
@@ -7002,26 +7070,32 @@ expand_mult_highpart (machine_mode mode, rtx op0, rtx
op1,
/* Return true if target supports vector masked load/store for mode. */
bool
-can_vec_mask_load_store_p (machine_mode mode, bool is_load)
+can_vec_mask_load_store_p (machine_mode mode,
+ machine_mode mask_mode,
+ bool is_load)
{
optab op = is_load ? maskload_optab : maskstore_optab;
- machine_mode vmode;
unsigned int vector_sizes;
/* If mode is vector mode, check it directly. */
if (VECTOR_MODE_P (mode))
- return optab_handler (op, mode) != CODE_FOR_nothing;
+ return convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing;
/* Otherwise, return true if there is some vector mode with
the mask load/store supported. */
/* See if there is any chance the mask load or store might be
vectorized. If not, punt. */
- vmode = targetm.vectorize.preferred_simd_mode (mode);
- if (!VECTOR_MODE_P (vmode))
+ mode = targetm.vectorize.preferred_simd_mode (mode);
+ if (!VECTOR_MODE_P (mode))
+ return false;
+
+ mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+ GET_MODE_SIZE (mode));
+ if (mask_mode == VOIDmode)
return false;
- if (optab_handler (op, vmode) != CODE_FOR_nothing)
+ if (convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
return true;
vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
@@ -7031,9 +7105,12 @@ can_vec_mask_load_store_p (machine_mode mode, bool
is_load)
vector_sizes &= ~cur;
if (cur <= GET_MODE_SIZE (mode))
continue;
- vmode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
- if (VECTOR_MODE_P (vmode)
- && optab_handler (op, vmode) != CODE_FOR_nothing)
+ mode = mode_for_vector (mode, cur / GET_MODE_SIZE (mode));
+ mask_mode = targetm.vectorize.get_mask_mode (GET_MODE_NUNITS (mode),
+ cur);
+ if (VECTOR_MODE_P (mode)
+ && mask_mode != VOIDmode
+ && convert_optab_handler (op, mode, mask_mode) != CODE_FOR_nothing)
return true;
}
return false;
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 888b21c..9804378 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -61,6 +61,10 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b")
OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b")
OPTAB_CD(vcond_optab, "vcond$a$b")
OPTAB_CD(vcondu_optab, "vcondu$a$b")
+OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
+OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")
+OPTAB_CD(maskload_optab, "maskload$a$b")
+OPTAB_CD(maskstore_optab, "maskstore$a$b")
OPTAB_NL(add_optab, "add$P$a3", PLUS, "add", '3', gen_int_fp_fixed_libfunc)
OPTAB_NX(add_optab, "add$F$a3")
@@ -264,8 +268,6 @@ OPTAB_D (udot_prod_optab, "udot_prod$I$a")
OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
OPTAB_D (usad_optab, "usad$I$a")
OPTAB_D (ssad_optab, "ssad$I$a")
-OPTAB_D (maskload_optab, "maskload$a")
-OPTAB_D (maskstore_optab, "maskstore$a")
OPTAB_D (vec_extract_optab, "vec_extract$a")
OPTAB_D (vec_init_optab, "vec_init$a")
OPTAB_D (vec_pack_sfix_trunc_optab, "vec_pack_sfix_trunc_$a")
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 95f5cbc..dfe9ebf 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -496,6 +496,12 @@ extern bool can_vec_perm_p (machine_mode, bool, const
unsigned char *);
extern rtx expand_vec_perm (machine_mode, rtx, rtx, rtx, rtx);
/* Return tree if target supports vector operations for COND_EXPR. */
+bool expand_vec_cmp_expr_p (tree, tree);
+
+/* Generate code for VEC_COND_EXPR. */
+extern rtx expand_vec_cmp_expr (tree, tree, rtx);
+
+/* Return true if target supports vector comparison. */
bool expand_vec_cond_expr_p (tree, tree);
/* Generate code for VEC_COND_EXPR. */
@@ -508,7 +514,7 @@ extern int can_mult_highpart_p (machine_mode, bool);
extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
/* Return true if target supports vector masked load/store for mode. */
-extern bool can_vec_mask_load_store_p (machine_mode, bool);
+extern bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
/* Return true if there is an inline compare and swap pattern. */
extern bool can_compare_and_swap_p (machine_mode, bool);
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 291e602..d66517d 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -811,7 +811,7 @@ ifcvt_can_use_mask_load_store (gimple stmt)
|| VECTOR_MODE_P (mode))
return false;
- if (can_vec_mask_load_store_p (mode, is_load))
+ if (can_vec_mask_load_store_p (mode, VOIDmode, is_load))
return true;
return false;
@@ -2068,7 +2068,7 @@ predicate_mem_writes (loop_p loop)
{
tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
- tree ref, addr, ptr, masktype, mask_op0, mask_op1, mask;
+ tree ref, addr, ptr, masktype, mask;
gimple new_stmt;
int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
@@ -2082,15 +2082,47 @@ predicate_mem_writes (loop_p loop)
mask = vect_masks[index];
else
{
- masktype = build_nonstandard_integer_type (bitsize, 1);
- mask_op0 = build_int_cst (masktype, swap ? 0 : -1);
- mask_op1 = build_int_cst (masktype, swap ? -1 : 0);
- cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
- is_gimple_condexpr,
- NULL_TREE,
- true, GSI_SAME_STMT);
- mask = fold_build_cond_expr (masktype, unshare_expr (cond),
- mask_op0, mask_op1);
+ masktype = boolean_type_node;
+ if ((TREE_CODE (cond) == NE_EXPR
+ || TREE_CODE (cond) == EQ_EXPR)
+ && (integer_zerop (TREE_OPERAND (cond, 1))
+ || integer_onep (TREE_OPERAND (cond, 1)))
+ && TREE_CODE (TREE_TYPE (TREE_OPERAND (cond, 0)))
+ == BOOLEAN_TYPE)
+ {
+ bool negate = (TREE_CODE (cond) == EQ_EXPR);
+ if (integer_onep (TREE_OPERAND (cond, 1)))
+ negate = !negate;
+ if (swap)
+ negate = !negate;
+ mask = TREE_OPERAND (cond, 0);
+ if (negate)
+ {
+ mask = ifc_temp_var (masktype, unshare_expr (cond),
+ &gsi);
+ mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+ }
+ }
+ else if (swap &&
+ TREE_CODE_CLASS (TREE_CODE (cond)) == tcc_comparison)
+ {
+ tree op_type = TREE_TYPE (TREE_OPERAND (cond, 0));
+ tree_code code
+ = invert_tree_comparison (TREE_CODE (cond),
+ HONOR_NANS (op_type));
+ if (code != ERROR_MARK)
+ mask = build2 (code, TREE_TYPE (cond),
+ TREE_OPERAND (cond, 0),
+ TREE_OPERAND (cond, 1));
+ else
+ {
+ mask = ifc_temp_var (masktype, unshare_expr (cond),
+ &gsi);
+ mask = build1 (TRUTH_NOT_EXPR, masktype, mask);
+ }
+ }
+ else
+ mask = unshare_expr (cond);
mask = ifc_temp_var (masktype, mask, &gsi);
/* Save mask and its size for further use. */
vect_sizes.safe_push (bitsize);
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index f1eaef4..0a39825 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -3849,6 +3849,9 @@ vect_get_new_vect_var (tree type, enum vect_var_kind
var_kind, const char *name)
case vect_scalar_var:
prefix = "stmp";
break;
+ case vect_mask_var:
+ prefix = "mask";
+ break;
case vect_pointer_var:
prefix = "vectp";
break;
@@ -4403,7 +4406,11 @@ vect_create_destination_var (tree scalar_dest, tree
vectype)
tree type;
enum vect_var_kind kind;
- kind = vectype ? vect_simple_var : vect_scalar_var;
+ kind = vectype
+ ? VECTOR_MASK_TYPE_P (vectype)
+ ? vect_mask_var
+ : vect_simple_var
+ : vect_scalar_var;
type = vectype ? vectype : TREE_TYPE (scalar_dest);
gcc_assert (TREE_CODE (scalar_dest) == SSA_NAME);
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 59c75af..1810f78 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -193,19 +193,21 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
{
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
- int nbbs = loop->num_nodes;
+ unsigned nbbs = loop->num_nodes;
unsigned int vectorization_factor = 0;
tree scalar_type;
gphi *phi;
tree vectype;
unsigned int nunits;
stmt_vec_info stmt_info;
- int i;
+ unsigned i;
HOST_WIDE_INT dummy;
gimple stmt, pattern_stmt = NULL;
gimple_seq pattern_def_seq = NULL;
gimple_stmt_iterator pattern_def_si = gsi_none ();
bool analyze_pattern_stmt = false;
+ bool bool_result;
+ auto_vec<stmt_vec_info> mask_producers;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
@@ -424,6 +426,8 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
return false;
}
+ bool_result = false;
+
if (STMT_VINFO_VECTYPE (stmt_info))
{
/* The only case when a vectype had been already set is for stmts
@@ -444,6 +448,32 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
else
scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+ /* Bool ops don't participate in vectorization factor
+ computation. For comparison use compared types to
+ compute a factor. */
+ if (TREE_CODE (scalar_type) == BOOLEAN_TYPE)
+ {
+ mask_producers.safe_push (stmt_info);
+ bool_result = true;
+
+ if (gimple_code (stmt) == GIMPLE_ASSIGN
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
+ == tcc_comparison
+ && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt)))
+ != BOOLEAN_TYPE)
+ scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+ else
+ {
+ if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
+ {
+ pattern_def_seq = NULL;
+ gsi_next (&si);
+ }
+ continue;
+ }
+ }
+
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
@@ -466,7 +496,8 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
return false;
}
- STMT_VINFO_VECTYPE (stmt_info) = vectype;
+ if (!bool_result)
+ STMT_VINFO_VECTYPE (stmt_info) = vectype;
if (dump_enabled_p ())
{
@@ -479,8 +510,9 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
/* The vectorization factor is according to the smallest
scalar type (or the largest vector size, but we only
support one vector size per loop). */
- scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
- &dummy);
+ if (!bool_result)
+ scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
+ &dummy);
if (dump_enabled_p ())
{
dump_printf_loc (MSG_NOTE, vect_location,
@@ -555,6 +587,100 @@ vect_determine_vectorization_factor (loop_vec_info
loop_vinfo)
}
LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
+ for (i = 0; i < mask_producers.length (); i++)
+ {
+ tree mask_type = NULL;
+ bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (mask_producers[i]);
+
+ stmt = STMT_VINFO_STMT (mask_producers[i]);
+
+ if (gimple_code (stmt) == GIMPLE_ASSIGN
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+ && TREE_CODE (TREE_TYPE (gimple_assign_rhs1 (stmt))) != BOOLEAN_TYPE)
+ {
+ scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+ mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+ if (!mask_type)
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "not vectorized: unsupported mask\n");
+ return false;
+ }
+ }
+ else
+ {
+ tree rhs, def;
+ ssa_op_iter iter;
+ gimple def_stmt;
+ enum vect_def_type dt;
+
+ FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+ {
+ if (!vect_is_simple_use_1 (rhs, stmt, loop_vinfo, bb_vinfo,
+ &def_stmt, &def, &dt, &vectype))
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "not vectorized: can't compute mask type
"
+ "for statement, ");
+ dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
stmt,
+ 0);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+ }
+ return false;
+ }
+
+ /* No vectype probably means external definition.
+ Allow it in case there is another operand which
+ allows to determine mask type. */
+ if (!vectype)
+ continue;
+
+ if (!mask_type)
+ mask_type = vectype;
+ else if (TYPE_VECTOR_SUBPARTS (mask_type)
+ != TYPE_VECTOR_SUBPARTS (vectype))
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "not vectorized: different sized masks "
+ "types in statement, ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+ mask_type);
+ dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+ dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+ vectype);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+ }
+ return false;
+ }
+ }
+ }
+
+ /* No mask_type should mean loop invariant predicate.
+ This is probably a subject for optimization in
+ if-conversion. */
+ if (!mask_type)
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "not vectorized: can't compute mask type "
+ "for statement, ");
+ dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt,
+ 0);
+ dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+ }
+ return false;
+ }
+
+ STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+ }
+
return true;
}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index f87c066..f3887be 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1316,27 +1316,61 @@ vect_init_vector_1 (gimple stmt, gimple new_stmt,
gimple_stmt_iterator *gsi)
tree
vect_init_vector (gimple stmt, tree val, tree type, gimple_stmt_iterator *gsi)
{
+ tree val_type = TREE_TYPE (val);
+ machine_mode mode = TYPE_MODE (type);
+ machine_mode val_mode = TYPE_MODE(val_type);
tree new_var;
gimple init_stmt;
tree vec_oprnd;
tree new_temp;
if (TREE_CODE (type) == VECTOR_TYPE
- && TREE_CODE (TREE_TYPE (val)) != VECTOR_TYPE)
- {
- if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
+ && TREE_CODE (val_type) != VECTOR_TYPE)
+ {
+ /* Handle vector of bool represented as a vector of
+ integers here rather than on expand because it is
+ a default mask type for targets. Vector mask is
+ built in a following way:
+
+ tmp = (int)val
+ vec_tmp = {tmp, ..., tmp}
+ vec_cst = VIEW_CONVERT_EXPR<vector(N) _Bool>(vec_tmp); */
+ if (TREE_CODE (val_type) == BOOLEAN_TYPE
+ && VECTOR_MODE_P (mode)
+ && SCALAR_INT_MODE_P (GET_MODE_INNER (mode))
+ && GET_MODE_INNER (mode) != val_mode)
{
- if (CONSTANT_CLASS_P (val))
- val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
- else
+ unsigned size = GET_MODE_BITSIZE (GET_MODE_INNER (mode));
+ tree stype = build_nonstandard_integer_type (size, 1);
+ tree vectype = get_vectype_for_scalar_type (stype);
+
+ new_temp = make_ssa_name (stype);
+ init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+ vect_init_vector_1 (stmt, init_stmt, gsi);
+
+ val = make_ssa_name (vectype);
+ new_temp = build_vector_from_val (vectype, new_temp);
+ init_stmt = gimple_build_assign (val, new_temp);
+ vect_init_vector_1 (stmt, init_stmt, gsi);
+
+ val = build1 (VIEW_CONVERT_EXPR, type, val);
+ }
+ else
+ {
+ if (!types_compatible_p (TREE_TYPE (type), val_type))
{
- new_temp = make_ssa_name (TREE_TYPE (type));
- init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
- vect_init_vector_1 (stmt, init_stmt, gsi);
- val = new_temp;
+ if (CONSTANT_CLASS_P (val))
+ val = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (type), val);
+ else
+ {
+ new_temp = make_ssa_name (TREE_TYPE (type));
+ init_stmt = gimple_build_assign (new_temp, NOP_EXPR, val);
+ vect_init_vector_1 (stmt, init_stmt, gsi);
+ val = new_temp;
+ }
}
+ val = build_vector_from_val (type, val);
}
- val = build_vector_from_val (type, val);
}
new_var = vect_get_new_vect_var (type, vect_simple_var, "cst_");
@@ -1368,6 +1402,7 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree
*scalar_def)
gimple def_stmt;
stmt_vec_info def_stmt_info = NULL;
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+ tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
unsigned int nunits;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
tree def;
@@ -1411,7 +1446,12 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree
*scalar_def)
/* Case 1: operand is a constant. */
case vect_constant_def:
{
- vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+ if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+ && VECTOR_MASK_TYPE_P (stmt_vectype))
+ vector_type = stmt_vectype;
+ else
+ vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+
gcc_assert (vector_type);
nunits = TYPE_VECTOR_SUBPARTS (vector_type);
@@ -1429,7 +1469,11 @@ vect_get_vec_def_for_operand (tree op, gimple stmt, tree
*scalar_def)
/* Case 2: operand is defined outside the loop - loop invariant. */
case vect_external_def:
{
- vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
+ if (TREE_CODE (TREE_TYPE (op)) == BOOLEAN_TYPE
+ && VECTOR_MASK_TYPE_P (stmt_vectype))
+ vector_type = stmt_vectype;
+ else
+ vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
gcc_assert (vector_type);
if (scalar_def)
@@ -1758,6 +1802,7 @@ vectorizable_mask_load_store (gimple stmt,
gimple_stmt_iterator *gsi,
bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ tree mask_vectype;
tree elem_type;
gimple new_stmt;
tree dummy;
@@ -1785,8 +1830,8 @@ vectorizable_mask_load_store (gimple stmt,
gimple_stmt_iterator *gsi,
is_store = gimple_call_internal_fn (stmt) == IFN_MASK_STORE;
mask = gimple_call_arg (stmt, 2);
- if (TYPE_PRECISION (TREE_TYPE (mask))
- != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype))))
+
+ if (TREE_CODE (TREE_TYPE (mask)) != BOOLEAN_TYPE)
return false;
/* FORNOW. This restriction should be relaxed. */
@@ -1815,6 +1860,19 @@ vectorizable_mask_load_store (gimple stmt,
gimple_stmt_iterator *gsi,
if (STMT_VINFO_STRIDED_P (stmt_info))
return false;
+ if (TREE_CODE (mask) != SSA_NAME)
+ return false;
+
+ if (!vect_is_simple_use_1 (mask, stmt, loop_vinfo, NULL,
+ &def_stmt, &def, &dt, &mask_vectype))
+ return false;
+
+ if (!mask_vectype)
+ mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
+
+ if (!mask_vectype)
+ return false;
+
if (STMT_VINFO_GATHER_P (stmt_info))
{
gimple def_stmt;
@@ -1848,14 +1906,9 @@ vectorizable_mask_load_store (gimple stmt,
gimple_stmt_iterator *gsi,
: DR_STEP (dr), size_zero_node) <= 0)
return false;
else if (!VECTOR_MODE_P (TYPE_MODE (vectype))
- || !can_vec_mask_load_store_p (TYPE_MODE (vectype), !is_store))
- return false;
-
- if (TREE_CODE (mask) != SSA_NAME)
- return false;
-
- if (!vect_is_simple_use (mask, stmt, loop_vinfo, NULL,
- &def_stmt, &def, &dt))
+ || !can_vec_mask_load_store_p (TYPE_MODE (vectype),
+ TYPE_MODE (mask_vectype),
+ !is_store))
return false;
if (is_store)
@@ -7229,10 +7282,7 @@ vectorizable_condition (gimple stmt,
gimple_stmt_iterator *gsi,
&& TREE_CODE (else_clause) != FIXED_CST)
return false;
- unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype)));
- /* The result of a vector comparison should be signed type. */
- tree cmp_type = build_nonstandard_integer_type (prec, 0);
- vec_cmp_type = get_same_sized_vectype (cmp_type, vectype);
+ vec_cmp_type = build_same_sized_truth_vector_type (comp_vectype);
if (vec_cmp_type == NULL_TREE)
return false;
@@ -7373,6 +7423,201 @@ vectorizable_condition (gimple stmt,
gimple_stmt_iterator *gsi,
return true;
}
+/* vectorizable_comparison.
+
+ Check if STMT is comparison expression that can be vectorized.
+ If VEC_STMT is also passed, vectorize the STMT: create a vectorized
+ comparison, put it in VEC_STMT, and insert it at GSI.
+
+ Return FALSE if not a vectorizable STMT, TRUE otherwise. */
+
+bool
+vectorizable_comparison (gimple stmt, gimple_stmt_iterator *gsi,
+ gimple *vec_stmt, tree reduc_def,
+ slp_tree slp_node)
+{
+ tree lhs, rhs1, rhs2;
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ tree vectype1, vectype2;
+ tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ tree vec_rhs1 = NULL_TREE, vec_rhs2 = NULL_TREE;
+ tree vec_compare;
+ tree new_temp;
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ tree def;
+ enum vect_def_type dt, dts[4];
+ unsigned nunits;
+ int ncopies;
+ enum tree_code code;
+ stmt_vec_info prev_stmt_info = NULL;
+ int i, j;
+ bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
+ vec<tree> vec_oprnds0 = vNULL;
+ vec<tree> vec_oprnds1 = vNULL;
+ tree mask_type;
+ tree mask;
+
+ if (!VECTOR_MASK_TYPE_P (vectype))
+ return false;
+
+ mask_type = vectype;
+ nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+ if (slp_node || PURE_SLP_STMT (stmt_info))
+ ncopies = 1;
+ else
+ ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+
+ gcc_assert (ncopies >= 1);
+ if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo)
+ return false;
+
+ if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def
+ && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle
+ && reduc_def))
+ return false;
+
+ if (STMT_VINFO_LIVE_P (stmt_info))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "value used after loop.\n");
+ return false;
+ }
+
+ if (!is_gimple_assign (stmt))
+ return false;
+
+ code = gimple_assign_rhs_code (stmt);
+
+ if (TREE_CODE_CLASS (code) != tcc_comparison)
+ return false;
+
+ rhs1 = gimple_assign_rhs1 (stmt);
+ rhs2 = gimple_assign_rhs2 (stmt);
+
+ if (TREE_CODE (rhs1) == SSA_NAME)
+ {
+ gimple rhs1_def_stmt = SSA_NAME_DEF_STMT (rhs1);
+ if (!vect_is_simple_use_1 (rhs1, stmt, loop_vinfo, bb_vinfo,
+ &rhs1_def_stmt, &def, &dt, &vectype1))
+ return false;
+ }
+ else if (TREE_CODE (rhs1) != INTEGER_CST && TREE_CODE (rhs1) != REAL_CST
+ && TREE_CODE (rhs1) != FIXED_CST)
+ return false;
+
+ if (TREE_CODE (rhs2) == SSA_NAME)
+ {
+ gimple rhs2_def_stmt = SSA_NAME_DEF_STMT (rhs2);
+ if (!vect_is_simple_use_1 (rhs2, stmt, loop_vinfo, bb_vinfo,
+ &rhs2_def_stmt, &def, &dt, &vectype2))
+ return false;
+ }
+ else if (TREE_CODE (rhs2) != INTEGER_CST && TREE_CODE (rhs2) != REAL_CST
+ && TREE_CODE (rhs2) != FIXED_CST)
+ return false;
+
+ vectype = vectype1 ? vectype1 : vectype2;
+
+ if (!vectype
+ || nunits != TYPE_VECTOR_SUBPARTS (vectype))
+ return false;
+
+ if (!vec_stmt)
+ {
+ STMT_VINFO_TYPE (stmt_info) = comparison_vec_info_type;
+ return expand_vec_cmp_expr_p (vectype, mask_type);
+ }
+
+ /* Transform. */
+ if (!slp_node)
+ {
+ vec_oprnds0.create (1);
+ vec_oprnds1.create (1);
+ }
+
+ /* Handle def. */
+ lhs = gimple_assign_lhs (stmt);
+ mask = vect_create_destination_var (lhs, mask_type);
+
+ /* Handle cmp expr. */
+ for (j = 0; j < ncopies; j++)
+ {
+ gassign *new_stmt = NULL;
+ if (j == 0)
+ {
+ if (slp_node)
+ {
+ auto_vec<tree, 2> ops;
+ auto_vec<vec<tree>, 2> vec_defs;
+
+ ops.safe_push (rhs1);
+ ops.safe_push (rhs2);
+ vect_get_slp_defs (ops, slp_node, &vec_defs, -1);
+ vec_oprnds1 = vec_defs.pop ();
+ vec_oprnds0 = vec_defs.pop ();
+
+ ops.release ();
+ vec_defs.release ();
+ }
+ else
+ {
+ gimple gtemp;
+ vec_rhs1
+ = vect_get_vec_def_for_operand (rhs1, stmt, NULL);
+ vect_is_simple_use (rhs1, stmt, loop_vinfo, NULL,
+ >emp, &def, &dts[0]);
+ vec_rhs2 =
+ vect_get_vec_def_for_operand (rhs2, stmt, NULL);
+ vect_is_simple_use (rhs2, stmt, loop_vinfo, NULL,
+ >emp, &def, &dts[1]);
+ }
+ }
+ else
+ {
+ vec_rhs1 = vect_get_vec_def_for_stmt_copy (dts[0],
+ vec_oprnds0.pop ());
+ vec_rhs2 = vect_get_vec_def_for_stmt_copy (dts[1],
+ vec_oprnds1.pop ());
+ }
+
+ if (!slp_node)
+ {
+ vec_oprnds0.quick_push (vec_rhs1);
+ vec_oprnds1.quick_push (vec_rhs2);
+ }
+
+ /* Arguments are ready. Create the new vector stmt. */
+ FOR_EACH_VEC_ELT (vec_oprnds0, i, vec_rhs1)
+ {
+ vec_rhs2 = vec_oprnds1[i];
+
+ vec_compare = build2 (code, mask_type, vec_rhs1, vec_rhs2);
+ new_stmt = gimple_build_assign (mask, vec_compare);
+ new_temp = make_ssa_name (mask, new_stmt);
+ gimple_assign_set_lhs (new_stmt, new_temp);
+ vect_finish_stmt_generation (stmt, new_stmt, gsi);
+ if (slp_node)
+ SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt);
+ }
+
+ if (slp_node)
+ continue;
+
+ if (j == 0)
+ STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt;
+ else
+ STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt;
+
+ prev_stmt_info = vinfo_for_stmt (new_stmt);
+ }
+
+ vec_oprnds0.release ();
+ vec_oprnds1.release ();
+
+ return true;
+}
/* Make sure the statement is vectorizable. */
@@ -7576,7 +7821,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize,
slp_tree node)
|| vectorizable_call (stmt, NULL, NULL, node)
|| vectorizable_store (stmt, NULL, NULL, node)
|| vectorizable_reduction (stmt, NULL, NULL, node)
- || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+ || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+ || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
else
{
if (bb_vinfo)
@@ -7588,7 +7834,8 @@ vect_analyze_stmt (gimple stmt, bool *need_to_vectorize,
slp_tree node)
|| vectorizable_load (stmt, NULL, NULL, node, NULL)
|| vectorizable_call (stmt, NULL, NULL, node)
|| vectorizable_store (stmt, NULL, NULL, node)
- || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node));
+ || vectorizable_condition (stmt, NULL, NULL, NULL, 0, node)
+ || vectorizable_comparison (stmt, NULL, NULL, NULL, node));
}
if (!ok)
@@ -7704,6 +7951,11 @@ vect_transform_stmt (gimple stmt, gimple_stmt_iterator
*gsi,
gcc_assert (done);
break;
+ case comparison_vec_info_type:
+ done = vectorizable_comparison (stmt, gsi, &vec_stmt, NULL, slp_node);
+ gcc_assert (done);
+ break;
+
case call_vec_info_type:
done = vectorizable_call (stmt, gsi, &vec_stmt, slp_node);
stmt = gsi_stmt (*gsi);
@@ -8038,6 +8290,23 @@ get_vectype_for_scalar_type (tree scalar_type)
return vectype;
}
+/* Function get_mask_type_for_scalar_type.
+
+ Returns the mask type corresponding to a result of comparison
+ of vectors of specified SCALAR_TYPE as supported by target. */
+
+tree
+get_mask_type_for_scalar_type (tree scalar_type)
+{
+ tree vectype = get_vectype_for_scalar_type (scalar_type);
+
+ if (!vectype)
+ return NULL;
+
+ return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype),
+ current_vector_size);
+}
+
/* Function get_same_sized_vectype
Returns a vector type corresponding to SCALAR_TYPE of size
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 58e8f10..94aea1a 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -28,7 +28,8 @@ along with GCC; see the file COPYING3. If not see
enum vect_var_kind {
vect_simple_var,
vect_pointer_var,
- vect_scalar_var
+ vect_scalar_var,
+ vect_mask_var
};
/* Defines type of operation. */
@@ -482,6 +483,7 @@ enum stmt_vec_info_type {
call_simd_clone_vec_info_type,
assignment_vec_info_type,
condition_vec_info_type,
+ comparison_vec_info_type,
reduc_vec_info_type,
induc_vec_info_type,
type_promotion_vec_info_type,
@@ -995,6 +997,7 @@ extern bool vect_can_advance_ivs_p (loop_vec_info);
/* In tree-vect-stmts.c. */
extern unsigned int current_vector_size;
extern tree get_vectype_for_scalar_type (tree);
+extern tree get_mask_type_for_scalar_type (tree);
extern tree get_same_sized_vectype (tree, tree);
extern bool vect_is_simple_use (tree, gimple, loop_vec_info,
bb_vec_info, gimple *,
diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index 758ca38..cffacaa 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -2957,7 +2957,7 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo,
bb_vec_info bb_vinfo)
default:
if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
{
- tree vecitype, comp_vectype;
+ tree vecitype, comp_vectype, mask_type;
/* If the comparison can throw, then is_gimple_condexpr will be
false and we can't make a COND_EXPR/VEC_COND_EXPR out of it. */
@@ -2968,6 +2968,11 @@ check_bool_pattern (tree var, loop_vec_info loop_vinfo,
bb_vec_info bb_vinfo)
if (comp_vectype == NULL_TREE)
return false;
+ mask_type = get_mask_type_for_scalar_type (TREE_TYPE (rhs1));
+ if (mask_type
+ && expand_vec_cmp_expr_p (comp_vectype, mask_type))
+ return false;
+
if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
{
machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
@@ -3192,6 +3197,75 @@ adjust_bool_pattern (tree var, tree out_type, tree
trueval,
}
+/* Try to determine a proper type for converting bool VAR
+ into an integer value. The type is chosen so that
+ conversion has the same number of elements as a mask
+ producer. */
+
+static tree
+search_type_for_mask (tree var, loop_vec_info loop_vinfo, bb_vec_info bb_vinfo)
+{
+ gimple def_stmt;
+ enum vect_def_type dt;
+ tree def, rhs1;
+ enum tree_code rhs_code;
+ tree res = NULL;
+
+ if (TREE_CODE (var) != SSA_NAME)
+ return NULL;
+
+ if ((TYPE_PRECISION (TREE_TYPE (var)) != 1
+ || !TYPE_UNSIGNED (TREE_TYPE (var)))
+ && TREE_CODE (TREE_TYPE (var)) != BOOLEAN_TYPE)
+ return NULL;
+
+ if (!vect_is_simple_use (var, NULL, loop_vinfo, bb_vinfo, &def_stmt, &def,
+ &dt))
+ return NULL;
+
+ if (dt != vect_internal_def)
+ return NULL;
+
+ if (!is_gimple_assign (def_stmt))
+ return NULL;
+
+ rhs_code = gimple_assign_rhs_code (def_stmt);
+ rhs1 = gimple_assign_rhs1 (def_stmt);
+
+ switch (rhs_code)
+ {
+ case SSA_NAME:
+ case BIT_NOT_EXPR:
+ CASE_CONVERT:
+ res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo);
+ break;
+
+ case BIT_AND_EXPR:
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ if (!(res = search_type_for_mask (rhs1, loop_vinfo, bb_vinfo)))
+ res = search_type_for_mask (gimple_assign_rhs2 (def_stmt),
+ loop_vinfo, bb_vinfo);
+ break;
+
+ default:
+ if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+ {
+ if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
+ || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+ {
+ machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+ res = build_nonstandard_integer_type (GET_MODE_BITSIZE (mode), 1);
+ }
+ else
+ res = TREE_TYPE (rhs1);
+ }
+ }
+
+ return res;
+}
+
+
/* Function vect_recog_bool_pattern
Try to find pattern like following:
@@ -3249,6 +3323,7 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree
*type_in,
enum tree_code rhs_code;
tree var, lhs, rhs, vectype;
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
+ stmt_vec_info new_stmt_info;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
gimple pattern_stmt;
@@ -3274,16 +3349,43 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree
*type_in,
if (vectype == NULL_TREE)
return NULL;
- if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
- return NULL;
-
- rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
- lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
- if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
- pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+ if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+ {
+ rhs = adjust_bool_pattern (var, TREE_TYPE (lhs), NULL_TREE, stmts);
+ lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+ if (useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
+ pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
+ else
+ pattern_stmt
+ = gimple_build_assign (lhs, NOP_EXPR, rhs);
+ }
else
- pattern_stmt
- = gimple_build_assign (lhs, NOP_EXPR, rhs);
+ {
+ tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+ tree cst0, cst1;
+
+ if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (lhs)))
+ type = TREE_TYPE (lhs);
+ cst0 = build_int_cst (type, 0);
+ cst1 = build_int_cst (type, 1);
+ lhs = vect_recog_temp_ssa_var (type, NULL);
+ pattern_stmt = gimple_build_assign (lhs, COND_EXPR, var, cst0, cst1);
+
+ if (!useless_type_conversion_p (type, TREE_TYPE (lhs)))
+ {
+ tree new_vectype = get_vectype_for_scalar_type (type);
+ new_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+ bb_vinfo);
+ set_vinfo_for_stmt (pattern_stmt, new_stmt_info);
+ STMT_VINFO_VECTYPE (new_stmt_info) = new_vectype;
+ new_pattern_def_seq (stmt_vinfo, pattern_stmt);
+
+ rhs = lhs;
+ lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
+ pattern_stmt = gimple_build_assign (lhs, CONVERT_EXPR, rhs);
+ }
+ }
+
*type_out = vectype;
*type_in = vectype;
stmts->safe_push (last_stmt);
@@ -3312,10 +3414,11 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree
*type_in,
if (get_vectype_for_scalar_type (type) == NULL_TREE)
return NULL;
- if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
- return NULL;
+ if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+ rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
+ else
+ rhs = var;
- rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts);
lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
pattern_stmt
= gimple_build_assign (lhs, COND_EXPR,
@@ -3340,16 +3443,38 @@ vect_recog_bool_pattern (vec<gimple> *stmts, tree
*type_in,
gcc_assert (vectype != NULL_TREE);
if (!VECTOR_MODE_P (TYPE_MODE (vectype)))
return NULL;
- if (!check_bool_pattern (var, loop_vinfo, bb_vinfo))
- return NULL;
- rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts);
+ if (check_bool_pattern (var, loop_vinfo, bb_vinfo))
+ rhs = adjust_bool_pattern (var, TREE_TYPE (vectype),
+ NULL_TREE, stmts);
+ else
+ {
+ tree type = search_type_for_mask (var, loop_vinfo, bb_vinfo);
+ tree cst0, cst1, new_vectype;
+
+ if (!type || TYPE_MODE (type) == TYPE_MODE (TREE_TYPE (vectype)))
+ type = TREE_TYPE (vectype);
+
+ cst0 = build_int_cst (type, 0);
+ cst1 = build_int_cst (type, 1);
+ new_vectype = get_vectype_for_scalar_type (type);
+
+ rhs = vect_recog_temp_ssa_var (type, NULL);
+ pattern_stmt = gimple_build_assign (rhs, COND_EXPR, var, cst0, cst1);
+
+ pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo,
+ bb_vinfo);
+ set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
+ STMT_VINFO_VECTYPE (pattern_stmt_info) = new_vectype;
+ append_pattern_def_seq (stmt_vinfo, pattern_stmt);
+ }
+
lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs);
if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
{
tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL);
gimple cast_stmt = gimple_build_assign (rhs2, NOP_EXPR, rhs);
- new_pattern_def_seq (stmt_vinfo, cast_stmt);
+ append_pattern_def_seq (stmt_vinfo, cast_stmt);
rhs = rhs2;
}
pattern_stmt = gimple_build_assign (lhs, SSA_NAME, rhs);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 6a17ef4..e22aa57 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -129,6 +129,9 @@ extern bool ix86_expand_fp_vcond (rtx[]);
extern bool ix86_expand_int_vcond (rtx[]);
extern void ix86_expand_vec_perm (rtx[]);
extern bool ix86_expand_vec_perm_const (rtx[]);
+extern bool ix86_expand_mask_vec_cmp (rtx[]);
+extern bool ix86_expand_int_vec_cmp (rtx[]);
+extern bool ix86_expand_fp_vec_cmp (rtx[]);
extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool);
extern bool ix86_expand_int_addcc (rtx[]);
extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 070605f..d17c350 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21440,8 +21440,8 @@ ix86_expand_sse_cmp (rtx dest, enum rtx_code code, rtx
cmp_op0, rtx cmp_op1,
cmp_op1 = force_reg (cmp_ops_mode, cmp_op1);
if (optimize
- || reg_overlap_mentioned_p (dest, op_true)
- || reg_overlap_mentioned_p (dest, op_false))
+ || (op_true && reg_overlap_mentioned_p (dest, op_true))
+ || (op_false && reg_overlap_mentioned_p (dest, op_false)))
dest = gen_reg_rtx (maskcmp ? cmp_mode : mode);
/* Compare patterns for int modes are unspec in AVX512F only. */
@@ -21713,34 +21713,127 @@ ix86_expand_fp_movcc (rtx operands[])
return true;
}
-/* Expand a floating-point vector conditional move; a vcond operation
- rather than a movcc operation. */
+/* Helper for ix86_cmp_code_to_pcmp_immediate for int modes. */
+
+static int
+ix86_int_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+ switch (code)
+ {
+ case EQ:
+ return 0;
+ case LT:
+ case LTU:
+ return 1;
+ case LE:
+ case LEU:
+ return 2;
+ case NE:
+ return 4;
+ case GE:
+ case GEU:
+ return 5;
+ case GT:
+ case GTU:
+ return 6;
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Helper for ix86_cmp_code_to_pcmp_immediate for fp modes. */
+
+static int
+ix86_fp_cmp_code_to_pcmp_immediate (enum rtx_code code)
+{
+ switch (code)
+ {
+ case EQ:
+ return 0x08;
+ case NE:
+ return 0x04;
+ case GT:
+ return 0x16;
+ case LE:
+ return 0x1a;
+ case GE:
+ return 0x15;
+ case LT:
+ return 0x19;
+ default:
+ gcc_unreachable ();
+ }
+}
+
+/* Return immediate value to be used in UNSPEC_PCMP
+ for comparison CODE in MODE. */
+
+static int
+ix86_cmp_code_to_pcmp_immediate (enum rtx_code code, machine_mode mode)
+{
+ if (FLOAT_MODE_P (mode))
+ return ix86_fp_cmp_code_to_pcmp_immediate (code);
+ return ix86_int_cmp_code_to_pcmp_immediate (code);
+}
+
+/* Expand AVX-512 vector comparison. */
bool
-ix86_expand_fp_vcond (rtx operands[])
+ix86_expand_mask_vec_cmp (rtx operands[])
{
- enum rtx_code code = GET_CODE (operands[3]);
+ machine_mode mask_mode = GET_MODE (operands[0]);
+ machine_mode cmp_mode = GET_MODE (operands[2]);
+ enum rtx_code code = GET_CODE (operands[1]);
+ rtx imm = GEN_INT (ix86_cmp_code_to_pcmp_immediate (code, cmp_mode));
+ int unspec_code;
+ rtx unspec;
+
+ switch (code)
+ {
+ case LEU:
+ case GTU:
+ case GEU:
+ case LTU:
+ unspec_code = UNSPEC_UNSIGNED_PCMP;
+ default:
+ unspec_code = UNSPEC_PCMP;
+ }
+
+ unspec = gen_rtx_UNSPEC (mask_mode, gen_rtvec (3, operands[2],
+ operands[3], imm),
+ unspec_code);
+ emit_insn (gen_rtx_SET (operands[0], unspec));
+
+ return true;
+}
+
+/* Expand fp vector comparison. */
+
+bool
+ix86_expand_fp_vec_cmp (rtx operands[])
+{
+ enum rtx_code code = GET_CODE (operands[1]);
rtx cmp;
code = ix86_prepare_sse_fp_compare_args (operands[0], code,
- &operands[4], &operands[5]);
+ &operands[2], &operands[3]);
if (code == UNKNOWN)
{
rtx temp;
- switch (GET_CODE (operands[3]))
+ switch (GET_CODE (operands[1]))
{
case LTGT:
- temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
- operands[5], operands[0], operands[0]);
- cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
- operands[5], operands[1], operands[2]);
+ temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[2],
+ operands[3], NULL, NULL);
+ cmp = ix86_expand_sse_cmp (operands[0], NE, operands[2],
+ operands[3], NULL, NULL);
code = AND;
break;
case UNEQ:
- temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
- operands[5], operands[0], operands[0]);
- cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
- operands[5], operands[1], operands[2]);
+ temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[2],
+ operands[3], NULL, NULL);
+ cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[2],
+ operands[3], NULL, NULL);
code = IOR;
break;
default:
@@ -21748,72 +21841,26 @@ ix86_expand_fp_vcond (rtx operands[])
}
cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
OPTAB_DIRECT);
- ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
- return true;
}
+ else
+ cmp = ix86_expand_sse_cmp (operands[0], code, operands[2], operands[3],
+ operands[1], operands[2]);
- if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
- operands[5], operands[1], operands[2]))
- return true;
+ if (operands[0] != cmp)
+ emit_move_insn (operands[0], cmp);
- cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
- operands[1], operands[2]);
- ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
return true;
}
-/* Expand a signed/unsigned integral vector conditional move. */
-
-bool
-ix86_expand_int_vcond (rtx operands[])
+static rtx
+ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, rtx cop0, rtx cop1,
+ rtx op_true, rtx op_false, bool *negate)
{
- machine_mode data_mode = GET_MODE (operands[0]);
- machine_mode mode = GET_MODE (operands[4]);
- enum rtx_code code = GET_CODE (operands[3]);
- bool negate = false;
- rtx x, cop0, cop1;
-
- cop0 = operands[4];
- cop1 = operands[5];
+ machine_mode data_mode = GET_MODE (dest);
+ machine_mode mode = GET_MODE (cop0);
+ rtx x;
- /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
- and x < 0 ? 1 : 0 into (unsigned) x >> 31. */
- if ((code == LT || code == GE)
- && data_mode == mode
- && cop1 == CONST0_RTX (mode)
- && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
- && GET_MODE_UNIT_SIZE (data_mode) > 1
- && GET_MODE_UNIT_SIZE (data_mode) <= 8
- && (GET_MODE_SIZE (data_mode) == 16
- || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
- {
- rtx negop = operands[2 - (code == LT)];
- int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
- if (negop == CONST1_RTX (data_mode))
- {
- rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
- operands[0], 1, OPTAB_DIRECT);
- if (res != operands[0])
- emit_move_insn (operands[0], res);
- return true;
- }
- else if (GET_MODE_INNER (data_mode) != DImode
- && vector_all_ones_operand (negop, data_mode))
- {
- rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
- operands[0], 0, OPTAB_DIRECT);
- if (res != operands[0])
- emit_move_insn (operands[0], res);
- return true;
- }
- }
-
- if (!nonimmediate_operand (cop1, mode))
- cop1 = force_reg (mode, cop1);
- if (!general_operand (operands[1], data_mode))
- operands[1] = force_reg (data_mode, operands[1]);
- if (!general_operand (operands[2], data_mode))
- operands[2] = force_reg (data_mode, operands[2]);
+ *negate = false;
/* XOP supports all of the comparisons on all 128-bit vector int types. */
if (TARGET_XOP
@@ -21834,13 +21881,13 @@ ix86_expand_int_vcond (rtx operands[])
case LE:
case LEU:
code = reverse_condition (code);
- negate = true;
+ *negate = true;
break;
case GE:
case GEU:
code = reverse_condition (code);
- negate = true;
+ *negate = true;
/* FALLTHRU */
case LT:
@@ -21861,14 +21908,14 @@ ix86_expand_int_vcond (rtx operands[])
case EQ:
/* SSE4.1 supports EQ. */
if (!TARGET_SSE4_1)
- return false;
+ return NULL;
break;
case GT:
case GTU:
/* SSE4.2 supports GT/GTU. */
if (!TARGET_SSE4_2)
- return false;
+ return NULL;
break;
default:
@@ -21929,12 +21976,13 @@ ix86_expand_int_vcond (rtx operands[])
case V8HImode:
/* Perform a parallel unsigned saturating subtraction. */
x = gen_reg_rtx (mode);
- emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0, cop1)));
+ emit_insn (gen_rtx_SET (x, gen_rtx_US_MINUS (mode, cop0,
+ cop1)));
cop0 = x;
cop1 = CONST0_RTX (mode);
code = EQ;
- negate = !negate;
+ *negate = !*negate;
break;
default:
@@ -21943,22 +21991,162 @@ ix86_expand_int_vcond (rtx operands[])
}
}
+ if (*negate)
+ std::swap (op_true, op_false);
+
/* Allow the comparison to be done in one mode, but the movcc to
happen in another mode. */
if (data_mode == mode)
{
- x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
- operands[1+negate], operands[2-negate]);
+ x = ix86_expand_sse_cmp (dest, code, cop0, cop1,
+ op_true, op_false);
}
else
{
gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
x = ix86_expand_sse_cmp (gen_reg_rtx (mode), code, cop0, cop1,
- operands[1+negate], operands[2-negate]);
+ op_true, op_false);
if (GET_MODE (x) == mode)
x = gen_lowpart (data_mode, x);
}
+ return x;
+}
+
+/* Expand integer vector comparison. */
+
+bool
+ix86_expand_int_vec_cmp (rtx operands[])
+{
+ rtx_code code = GET_CODE (operands[1]);
+ bool negate = false;
+ rtx cmp = ix86_expand_int_sse_cmp (operands[0], code, operands[2],
+ operands[3], NULL, NULL, &negate);
+
+ if (!cmp)
+ return false;
+
+ if (negate)
+ cmp = ix86_expand_int_sse_cmp (operands[0], EQ, cmp,
+ CONST0_RTX (GET_MODE (cmp)),
+ NULL, NULL, &negate);
+
+ gcc_assert (!negate);
+
+ if (operands[0] != cmp)
+ emit_move_insn (operands[0], cmp);
+
+ return true;
+}
+
+/* Expand a floating-point vector conditional move; a vcond operation
+ rather than a movcc operation. */
+
+bool
+ix86_expand_fp_vcond (rtx operands[])
+{
+ enum rtx_code code = GET_CODE (operands[3]);
+ rtx cmp;
+
+ code = ix86_prepare_sse_fp_compare_args (operands[0], code,
+ &operands[4], &operands[5]);
+ if (code == UNKNOWN)
+ {
+ rtx temp;
+ switch (GET_CODE (operands[3]))
+ {
+ case LTGT:
+ temp = ix86_expand_sse_cmp (operands[0], ORDERED, operands[4],
+ operands[5], operands[0], operands[0]);
+ cmp = ix86_expand_sse_cmp (operands[0], NE, operands[4],
+ operands[5], operands[1], operands[2]);
+ code = AND;
+ break;
+ case UNEQ:
+ temp = ix86_expand_sse_cmp (operands[0], UNORDERED, operands[4],
+ operands[5], operands[0], operands[0]);
+ cmp = ix86_expand_sse_cmp (operands[0], EQ, operands[4],
+ operands[5], operands[1], operands[2]);
+ code = IOR;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+ cmp = expand_simple_binop (GET_MODE (cmp), code, temp, cmp, cmp, 1,
+ OPTAB_DIRECT);
+ ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+ return true;
+ }
+
+ if (ix86_expand_sse_fp_minmax (operands[0], code, operands[4],
+ operands[5], operands[1], operands[2]))
+ return true;
+
+ cmp = ix86_expand_sse_cmp (operands[0], code, operands[4], operands[5],
+ operands[1], operands[2]);
+ ix86_expand_sse_movcc (operands[0], cmp, operands[1], operands[2]);
+ return true;
+}
+
+/* Expand a signed/unsigned integral vector conditional move. */
+
+bool
+ix86_expand_int_vcond (rtx operands[])
+{
+ machine_mode data_mode = GET_MODE (operands[0]);
+ machine_mode mode = GET_MODE (operands[4]);
+ enum rtx_code code = GET_CODE (operands[3]);
+ bool negate = false;
+ rtx x, cop0, cop1;
+
+ cop0 = operands[4];
+ cop1 = operands[5];
+
+ /* Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
+ and x < 0 ? 1 : 0 into (unsigned) x >> 31. */
+ if ((code == LT || code == GE)
+ && data_mode == mode
+ && cop1 == CONST0_RTX (mode)
+ && operands[1 + (code == LT)] == CONST0_RTX (data_mode)
+ && GET_MODE_UNIT_SIZE (data_mode) > 1
+ && GET_MODE_UNIT_SIZE (data_mode) <= 8
+ && (GET_MODE_SIZE (data_mode) == 16
+ || (TARGET_AVX2 && GET_MODE_SIZE (data_mode) == 32)))
+ {
+ rtx negop = operands[2 - (code == LT)];
+ int shift = GET_MODE_UNIT_BITSIZE (data_mode) - 1;
+ if (negop == CONST1_RTX (data_mode))
+ {
+ rtx res = expand_simple_binop (mode, LSHIFTRT, cop0, GEN_INT (shift),
+ operands[0], 1, OPTAB_DIRECT);
+ if (res != operands[0])
+ emit_move_insn (operands[0], res);
+ return true;
+ }
+ else if (GET_MODE_INNER (data_mode) != DImode
+ && vector_all_ones_operand (negop, data_mode))
+ {
+ rtx res = expand_simple_binop (mode, ASHIFTRT, cop0, GEN_INT (shift),
+ operands[0], 0, OPTAB_DIRECT);
+ if (res != operands[0])
+ emit_move_insn (operands[0], res);
+ return true;
+ }
+ }
+
+ if (!nonimmediate_operand (cop1, mode))
+ cop1 = force_reg (mode, cop1);
+ if (!general_operand (operands[1], data_mode))
+ operands[1] = force_reg (data_mode, operands[1]);
+ if (!general_operand (operands[2], data_mode))
+ operands[2] = force_reg (data_mode, operands[2]);
+
+ x = ix86_expand_int_sse_cmp (operands[0], code, cop0, cop1,
+ operands[1], operands[2], &negate);
+
+ if (!x)
+ return false;
+
ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
operands[2-negate]);
return true;
@@ -51678,6 +51866,30 @@ ix86_autovectorize_vector_sizes (void)
(TARGET_AVX && !TARGET_PREFER_AVX128) ? 32 | 16 : 0;
}
+/* Implemenation of targetm.vectorize.get_mask_mode. */
+
+static machine_mode
+ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+{
+ /* Scalar mask case. */
+ if (TARGET_AVX512F && vector_size == 64)
+ {
+ unsigned elem_size = vector_size / nunits;
+ if ((vector_size == 64 || TARGET_AVX512VL)
+ && ((elem_size == 4 || elem_size == 8)
+ || TARGET_AVX512BW))
+ return smallest_mode_for_size (nunits, MODE_INT);
+ }
+
+ unsigned elem_size = vector_size / nunits;
+ machine_mode elem_mode
+ = smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT);
+
+ gcc_assert (elem_size * nunits == vector_size);
+
+ return mode_for_vector (elem_mode, nunits);
+}
+
/* Return class of registers which could be used for pseudo of MODE
@@ -52612,6 +52824,8 @@ ix86_operands_ok_for_move_multiple (rtx *operands, bool
load,
#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES
#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES \
ix86_autovectorize_vector_sizes
+#undef TARGET_VECTORIZE_GET_MASK_MODE
+#define TARGET_VECTORIZE_GET_MASK_MODE ix86_get_mask_mode
#undef TARGET_VECTORIZE_INIT_COST
#define TARGET_VECTORIZE_INIT_COST ix86_init_cost
#undef TARGET_VECTORIZE_ADD_STMT_COST
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4535570..a8d55cc 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -605,6 +605,15 @@
(V16SF "HI") (V8SF "QI") (V4SF "QI")
(V8DF "QI") (V4DF "QI") (V2DF "QI")])
+;; Mapping of vector modes to corresponding mask size
+(define_mode_attr avx512fmaskmodelower
+ [(V64QI "di") (V32QI "si") (V16QI "hi")
+ (V32HI "si") (V16HI "hi") (V8HI "qi") (V4HI "qi")
+ (V16SI "hi") (V8SI "qi") (V4SI "qi")
+ (V8DI "qi") (V4DI "qi") (V2DI "qi")
+ (V16SF "hi") (V8SF "qi") (V4SF "qi")
+ (V8DF "qi") (V4DF "qi") (V2DF "qi")])
+
;; Mapping of vector float modes to an integer mode of the same size
(define_mode_attr sseintvecmode
[(V16SF "V16SI") (V8DF "V8DI")
@@ -2803,6 +2812,150 @@
(const_string "0")))
(set_attr "mode" "<MODE>")])
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+ [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+ (match_operator:<avx512fmaskmode> 1 ""
+ [(match_operand:V48_AVX512VL 2 "register_operand")
+ (match_operand:V48_AVX512VL 3 "nonimmediate_operand")]))]
+ "TARGET_AVX512F"
+{
+ bool ok = ix86_expand_mask_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmp<mode><avx512fmaskmodelower>"
+ [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+ (match_operator:<avx512fmaskmode> 1 ""
+ [(match_operand:VI12_AVX512VL 2 "register_operand")
+ (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+ "TARGET_AVX512BW"
+{
+ bool ok = ix86_expand_mask_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VI_256 2 "register_operand")
+ (match_operand:VI_256 3 "nonimmediate_operand")]))]
+ "TARGET_AVX2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VI124_128 2 "register_operand")
+ (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+ "TARGET_SSE2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpv2div2di"
+ [(set (match_operand:V2DI 0 "register_operand")
+ (match_operator:V2DI 1 ""
+ [(match_operand:V2DI 2 "register_operand")
+ (match_operand:V2DI 3 "nonimmediate_operand")]))]
+ "TARGET_SSE4_2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VF_256 2 "register_operand")
+ (match_operand:VF_256 3 "nonimmediate_operand")]))]
+ "TARGET_AVX"
+{
+ bool ok = ix86_expand_fp_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmp<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VF_128 2 "register_operand")
+ (match_operand:VF_128 3 "nonimmediate_operand")]))]
+ "TARGET_SSE"
+{
+ bool ok = ix86_expand_fp_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+ [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+ (match_operator:<avx512fmaskmode> 1 ""
+ [(match_operand:VI48_AVX512VL 2 "register_operand")
+ (match_operand:VI48_AVX512VL 3 "nonimmediate_operand")]))]
+ "TARGET_AVX512F"
+{
+ bool ok = ix86_expand_mask_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpu<mode><avx512fmaskmodelower>"
+ [(set (match_operand:<avx512fmaskmode> 0 "register_operand")
+ (match_operator:<avx512fmaskmode> 1 ""
+ [(match_operand:VI12_AVX512VL 2 "register_operand")
+ (match_operand:VI12_AVX512VL 3 "nonimmediate_operand")]))]
+ "TARGET_AVX512BW"
+{
+ bool ok = ix86_expand_mask_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VI_256 2 "register_operand")
+ (match_operand:VI_256 3 "nonimmediate_operand")]))]
+ "TARGET_AVX2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpu<mode><sseintvecmodelower>"
+ [(set (match_operand:<sseintvecmode> 0 "register_operand")
+ (match_operator:<sseintvecmode> 1 ""
+ [(match_operand:VI124_128 2 "register_operand")
+ (match_operand:VI124_128 3 "nonimmediate_operand")]))]
+ "TARGET_SSE2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
+(define_expand "vec_cmpuv2div2di"
+ [(set (match_operand:V2DI 0 "register_operand")
+ (match_operator:V2DI 1 ""
+ [(match_operand:V2DI 2 "register_operand")
+ (match_operand:V2DI 3 "nonimmediate_operand")]))]
+ "TARGET_SSE4_2"
+{
+ bool ok = ix86_expand_int_vec_cmp (operands);
+ gcc_assert (ok);
+ DONE;
+})
+
(define_expand "vcond<V_512:mode><VF_512:mode>"
[(set (match_operand:V_512 0 "register_operand")
(if_then_else:V_512
@@ -17895,7 +18048,7 @@
(set_attr "btver2_decode" "vector")
(set_attr "mode" "<sseinsnmode>")])
-(define_expand "maskload<mode>"
+(define_expand "maskload<mode><sseintvecmodelower>"
[(set (match_operand:V48_AVX2 0 "register_operand")
(unspec:V48_AVX2
[(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17903,7 +18056,23 @@
UNSPEC_MASKMOV))]
"TARGET_AVX")
-(define_expand "maskstore<mode>"
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+ [(set (match_operand:V48_AVX512VL 0 "register_operand")
+ (vec_merge:V48_AVX512VL
+ (match_operand:V48_AVX512VL 1 "memory_operand")
+ (match_dup 0)
+ (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+ "TARGET_AVX512F")
+
+(define_expand "maskload<mode><avx512fmaskmodelower>"
+ [(set (match_operand:VI12_AVX512VL 0 "register_operand")
+ (vec_merge:VI12_AVX512VL
+ (match_operand:VI12_AVX512VL 1 "memory_operand")
+ (match_dup 0)
+ (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+ "TARGET_AVX512BW")
+
+(define_expand "maskstore<mode><sseintvecmodelower>"
[(set (match_operand:V48_AVX2 0 "memory_operand")
(unspec:V48_AVX2
[(match_operand:<sseintvecmode> 2 "register_operand")
@@ -17912,6 +18081,22 @@
UNSPEC_MASKMOV))]
"TARGET_AVX")
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+ [(set (match_operand:V48_AVX512VL 0 "memory_operand")
+ (vec_merge:V48_AVX512VL
+ (match_operand:V48_AVX512VL 1 "register_operand")
+ (match_dup 0)
+ (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+ "TARGET_AVX512F")
+
+(define_expand "maskstore<mode><avx512fmaskmodelower>"
+ [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
+ (vec_merge:VI12_AVX512VL
+ (match_operand:VI12_AVX512VL 1 "register_operand")
+ (match_dup 0)
+ (match_operand:<avx512fmaskmode> 2 "register_operand")))]
+ "TARGET_AVX512BW")
+
(define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
[(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
(unspec:AVX256MODE2P