On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <[email protected]> wrote:
> Hi All,
>
> Here is a patch for 481.wrf preformance regression for avx2 which is
> sligthly modified mask store optimization. This transformation allows
> perform unpredication for semi-hammock containing masked stores, other
> words if we have a loop like
> for (i=0; i<n; i++)
> if (c[i]) {
> p1[i] += 1;
> p2[i] = p3[i] +2;
> }
>
> then it will be transformed to
> if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
> vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B, mask__ifc__42.18_165);
> vect__12.22_172 = vect__11.19_170 + vect_cst__171;
> MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165, vect__12.22_172);
> vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B, mask__ifc__42.18_165);
> vect__19.28_184 = vect__18.25_182 + vect_cst__183;
> MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165, vect__19.28_184);
> }
> i.e. it will put all computations related to masked stores to semi-hammock.
>
> Bootstrapping and regression testing did not show any new failures.
Can you please split out the middle-end support for vector equality compares?
@@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
if (TREE_CODE (op0_type) == VECTOR_TYPE
|| TREE_CODE (op1_type) == VECTOR_TYPE)
{
- error ("vector comparison returning a boolean");
- debug_generic_expr (op0_type);
- debug_generic_expr (op1_type);
- return true;
+ /* Allow vector comparison returning boolean if operand types
+ are equal and CODE is EQ/NE. */
+ if ((code != EQ_EXPR && code != NE_EXPR)
+ || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
+ || VECTOR_INTEGER_TYPE_P (op0_type)))
+ {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+ }
}
}
please merge the conditions with a &&
@@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
tree type, tree op0, tree op1)
if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
{
+ if (INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1))
+ {
+ /* Have vector comparison with scalar boolean result. */
+ bool result = true;
+ gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+ gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+ for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+ {
+ tree elem0 = VECTOR_CST_ELT (op0, i);
+ tree elem1 = VECTOR_CST_ELT (op1, i);
+ tree tmp = fold_relational_const (code, type, elem0, elem1);
+ result &= integer_onep (tmp);
+ if (code == NE_EXPR)
+ result = !result;
+ return constant_boolean_node (result, type);
... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want
to change the
guarding condition to just
if (! VECTOR_TYPE_P (type))
and assert the boolean/precision. Please also merge the asserts into
one with &&
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..73ee3be 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
tree_code code, tree type,
gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
+ /* Do not perform combining it types are not compatible. */
+ if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+ && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
+ return NULL_TREE;
+
again, how does this happen?
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e67048e..1605520c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
gimple_stmt_iterator si,
&comp_code, &val))
return;
+ /* Use of vector comparison in gcond is very restricted and used to check
+ that the mask in masked store is zero, so assert for such comparison
+ is not implemented yet. */
+ if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
+ return;
+
VECTOR_TYPE_P
I believe the comment should simply say that VRP doesn't track ranges for
vector types.
In the previous review I suggested you should make sure that RTL expansion
ends up using a well-defined optab for these compares. To make sure
this happens across targets I suggest you make these comparisons available
via the GCC vector extension. Thus allow
typedef int v4si __attribute__((vector_size(16)));
int foo (v4si a, v4si b)
{
if (a == b)
return 4;
}
and != and also using floating point vectors.
Otherwise it's hard to see the impact of this change. Obvious choices
are the eq/ne optabs for FP compares and [u]cmp optabs for integer
compares.
A half-way implementation like your VRP comment suggests (only
==/!= zero against integer vectors is implemented?!) this doesn't sound
good without also limiting the feature this way in the verifier.
Btw, the regression with WRF is >50% on AMD Bulldozer (which only
has AVX, not AVX2).
Thanks,
Richard.
> ChangeLog:
> 2015-11-30 Yuri Rumyantsev <[email protected]>
>
> PR middle-end/68542
> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
> comparison with boolean result.
> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
> for vector comparion with eq/ne only.
> * fold-const.c (fold_relational_const): Add handling of vector
> comparison with boolean result.
> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
> comparison of vector operands with boolean result for EQ/NE only.
> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
> (verify_gimple_cond): Likewise.
> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
> combining for non-compatible vector types.
> * tree-vect-loop.c (is_valid_sink): New function.
> (optimize_mask_stores): Likewise.
> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
> has_mask_store field of vect_info.
> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
> vectorized loops having masked stores.
> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
> correspondent macros.
> (optimize_mask_stores): Add prototype.
> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
> type.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.