On Fri, Dec 4, 2015 at 4:07 PM, Yuri Rumyantsev <[email protected]> wrote:
> Hi Richard.
>
> Thanks a lot for your review.
> Below are my answers.
>
> You asked why I inserted additional check to
> ++ b/gcc/tree-ssa-forwprop.c
> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
> tree_code code, tree type,
>
> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>
> + /* Do not perform combining it types are not compatible. */
> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE (op0))))
> + return NULL_TREE;
> +
>
> again, how does this happen?
>
> This is because without it I've got assert in fold_convert_loc
> gcc_assert (TREE_CODE (orig) == VECTOR_TYPE
> && tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (orig)));
>
> since it tries to convert vector of bool to scalar bool.
> Here is essential part of call-stack:
>
> #0 internal_error (gmsgid=0x1e48397 "in %s, at %s:%d")
> at ../../gcc/diagnostic.c:1259
> #1 0x0000000001743ada in fancy_abort (
> file=0x1847fc3 "../../gcc/fold-const.c", line=2217,
> function=0x184b9d0 <fold_convert_loc(unsigned int, tree_node*,
> tree_node*)::__FUNCTION__> "fold_convert_loc") at
> ../../gcc/diagnostic.c:1332
> #2 0x00000000009c8330 in fold_convert_loc (loc=0, type=0x7ffff18a9d20,
> arg=0x7ffff1a7f488) at ../../gcc/fold-const.c:2216
> #3 0x00000000009f003f in fold_ternary_loc (loc=0, code=VEC_COND_EXPR,
> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:11453
> #4 0x00000000009f2f94 in fold_build3_stat_loc (loc=0, code=VEC_COND_EXPR,
> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff18c2000,
> op2=0x7ffff18c2030) at ../../gcc/fold-const.c:12394
> #5 0x00000000009d870c in fold_binary_op_with_conditional_arg (loc=0,
> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
> op1=0x7ffff1a48780, cond=0x7ffff1a7f460, arg=0x7ffff1a48780,
> cond_first_p=1) at ../../gcc/fold-const.c:6465
> #6 0x00000000009e3407 in fold_binary_loc (loc=0, code=EQ_EXPR,
> type=0x7ffff18a9d20, op0=0x7ffff1a7f460, op1=0x7ffff1a48780)
> at ../../gcc/fold-const.c:9211
> #7 0x0000000000ecb8fa in combine_cond_expr_cond (stmt=0x7ffff1a487d0,
> code=EQ_EXPR, type=0x7ffff18a9d20, op0=0x7ffff1a7f460,
> op1=0x7ffff1a48780, invariant_only=true)
> at ../../gcc/tree-ssa-forwprop.c:382
Ok, but that only shows that
/* Convert A ? 1 : 0 to simply A. */
if ((code == VEC_COND_EXPR ? integer_all_onesp (op1)
: (integer_onep (op1)
&& !VECTOR_TYPE_P (type)))
&& integer_zerop (op2)
/* If we try to convert OP0 to our type, the
call to fold will try to move the conversion inside
a COND, which will recurse. In that case, the COND_EXPR
is probably the best choice, so leave it alone. */
&& type == TREE_TYPE (arg0))
return pedantic_non_lvalue_loc (loc, arg0);
/* Convert A ? 0 : 1 to !A. This prefers the use of NOT_EXPR
over COND_EXPR in cases such as floating point comparisons. */
if (integer_zerop (op1)
&& (code == VEC_COND_EXPR ? integer_all_onesp (op2)
: (integer_onep (op2)
&& !VECTOR_TYPE_P (type)))
&& truth_value_p (TREE_CODE (arg0)))
return pedantic_non_lvalue_loc (loc,
fold_convert_loc (loc, type,
invert_truthvalue_loc (loc,
arg0)));
are wrong? I can't say for sure without a testcase.
That said, papering over this in tree-ssa-forwprop.c is not the
correct thing to do.
> Secondly, I did not catch your idea to implement GCC Vector Extension
> for vector comparison with bool result since
> such extension completely depends on comparison context, e.g. for your
> example, result type of comparison depends on using - for
> if-comparison it is scalar, but for c = (a==b) - result type is
> vector. I don't think that this is reasonable for current release.
The idea was to be able to write testcases exercising different EQ/NE vector
compares. But yes, if that's non-trivial the it's not appropriate for stage3.
Can you add a testcase for the forwprop issue and try to fix the offending
bogus folders instead?
Thanks,
Richard.
> And finally about AMD performance. I checked that this transformation
> works for "-march=bdver4" option and regression for 481.wrf must
> disappear too.
>
> Thanks.
> Yuri.
>
> 2015-12-04 15:18 GMT+03:00 Richard Biener <[email protected]>:
>> On Mon, Nov 30, 2015 at 2:11 PM, Yuri Rumyantsev <[email protected]> wrote:
>>> Hi All,
>>>
>>> Here is a patch for 481.wrf preformance regression for avx2 which is
>>> sligthly modified mask store optimization. This transformation allows
>>> perform unpredication for semi-hammock containing masked stores, other
>>> words if we have a loop like
>>> for (i=0; i<n; i++)
>>> if (c[i]) {
>>> p1[i] += 1;
>>> p2[i] = p3[i] +2;
>>> }
>>>
>>> then it will be transformed to
>>> if (!mask__ifc__42.18_165 == { 0, 0, 0, 0, 0, 0, 0, 0 }) {
>>> vect__11.19_170 = MASK_LOAD (vectp_p1.20_168, 0B,
>>> mask__ifc__42.18_165);
>>> vect__12.22_172 = vect__11.19_170 + vect_cst__171;
>>> MASK_STORE (vectp_p1.23_175, 0B, mask__ifc__42.18_165,
>>> vect__12.22_172);
>>> vect__18.25_182 = MASK_LOAD (vectp_p3.26_180, 0B,
>>> mask__ifc__42.18_165);
>>> vect__19.28_184 = vect__18.25_182 + vect_cst__183;
>>> MASK_STORE (vectp_p2.29_187, 0B, mask__ifc__42.18_165,
>>> vect__19.28_184);
>>> }
>>> i.e. it will put all computations related to masked stores to semi-hammock.
>>>
>>> Bootstrapping and regression testing did not show any new failures.
>>
>> Can you please split out the middle-end support for vector equality compares?
>>
>> @@ -3448,10 +3448,17 @@ verify_gimple_comparison (tree type, tree op0, tree
>> op1)
>> if (TREE_CODE (op0_type) == VECTOR_TYPE
>> || TREE_CODE (op1_type) == VECTOR_TYPE)
>> {
>> - error ("vector comparison returning a boolean");
>> - debug_generic_expr (op0_type);
>> - debug_generic_expr (op1_type);
>> - return true;
>> + /* Allow vector comparison returning boolean if operand types
>> + are equal and CODE is EQ/NE. */
>> + if ((code != EQ_EXPR && code != NE_EXPR)
>> + || !(VECTOR_BOOLEAN_TYPE_P (op0_type)
>> + || VECTOR_INTEGER_TYPE_P (op0_type)))
>> + {
>> + error ("type mismatch for vector comparison returning a
>> boolean");
>> + debug_generic_expr (op0_type);
>> + debug_generic_expr (op1_type);
>> + return true;
>> + }
>> }
>> }
>>
>> please merge the conditions with a &&
>>
>> @@ -13888,6 +13888,25 @@ fold_relational_const (enum tree_code code,
>> tree type, tree op0, tree op1)
>>
>> if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
>> {
>> + if (INTEGRAL_TYPE_P (type)
>> + && (TREE_CODE (type) == BOOLEAN_TYPE
>> + || TYPE_PRECISION (type) == 1))
>> + {
>> + /* Have vector comparison with scalar boolean result. */
>> + bool result = true;
>> + gcc_assert (code == EQ_EXPR || code == NE_EXPR);
>> + gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
>> + for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
>> + {
>> + tree elem0 = VECTOR_CST_ELT (op0, i);
>> + tree elem1 = VECTOR_CST_ELT (op1, i);
>> + tree tmp = fold_relational_const (code, type, elem0, elem1);
>> + result &= integer_onep (tmp);
>> + if (code == NE_EXPR)
>> + result = !result;
>> + return constant_boolean_node (result, type);
>>
>> ... just assumes it is either EQ_EXPR or NE_EXPR. I believe you want
>> to change the
>> guarding condition to just
>>
>> if (! VECTOR_TYPE_P (type))
>>
>> and assert the boolean/precision. Please also merge the asserts into
>> one with &&
>>
>> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
>> index b82ae3c..73ee3be 100644
>> --- a/gcc/tree-ssa-forwprop.c
>> +++ b/gcc/tree-ssa-forwprop.c
>> @@ -373,6 +373,11 @@ combine_cond_expr_cond (gimple *stmt, enum
>> tree_code code, tree type,
>>
>> gcc_assert (TREE_CODE_CLASS (code) == tcc_comparison);
>>
>> + /* Do not perform combining it types are not compatible. */
>> + if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
>> + && !tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (TREE_TYPE
>> (op0))))
>> + return NULL_TREE;
>> +
>>
>> again, how does this happen?
>>
>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
>> index e67048e..1605520c 100644
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -5760,6 +5760,12 @@ register_edge_assert_for (tree name, edge e,
>> gimple_stmt_iterator si,
>> &comp_code, &val))
>> return;
>>
>> + /* Use of vector comparison in gcond is very restricted and used to check
>> + that the mask in masked store is zero, so assert for such comparison
>> + is not implemented yet. */
>> + if (TREE_CODE (TREE_TYPE (name)) == VECTOR_TYPE)
>> + return;
>> +
>>
>> VECTOR_TYPE_P
>>
>> I believe the comment should simply say that VRP doesn't track ranges for
>> vector types.
>>
>> In the previous review I suggested you should make sure that RTL expansion
>> ends up using a well-defined optab for these compares. To make sure
>> this happens across targets I suggest you make these comparisons available
>> via the GCC vector extension. Thus allow
>>
>> typedef int v4si __attribute__((vector_size(16)));
>>
>> int foo (v4si a, v4si b)
>> {
>> if (a == b)
>> return 4;
>> }
>>
>> and != and also using floating point vectors.
>>
>> Otherwise it's hard to see the impact of this change. Obvious choices
>> are the eq/ne optabs for FP compares and [u]cmp optabs for integer
>> compares.
>>
>> A half-way implementation like your VRP comment suggests (only
>> ==/!= zero against integer vectors is implemented?!) this doesn't sound
>> good without also limiting the feature this way in the verifier.
>>
>> Btw, the regression with WRF is >50% on AMD Bulldozer (which only
>> has AVX, not AVX2).
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog:
>>> 2015-11-30 Yuri Rumyantsev <[email protected]>
>>>
>>> PR middle-end/68542
>>> * config/i386/i386.c (ix86_expand_branch): Implement integral vector
>>> comparison with boolean result.
>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
>>> for vector comparion with eq/ne only.
>>> * fold-const.c (fold_relational_const): Add handling of vector
>>> comparison with boolean result.
>>> * tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
>>> comparison of vector operands with boolean result for EQ/NE only.
>>> (verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
>>> (verify_gimple_cond): Likewise.
>>> * tree-ssa-forwprop.c (combine_cond_expr_cond): Do not perform
>>> combining for non-compatible vector types.
>>> * tree-vect-loop.c (is_valid_sink): New function.
>>> (optimize_mask_stores): Likewise.
>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>> has_mask_store field of vect_info.
>>> * tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
>>> vectorized loops having masked stores.
>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>> correspondent macros.
>>> (optimize_mask_stores): Add prototype.
>>> * tree-vrp.c (register_edge_assert_for): Do not handle NAME with vector
>>> type.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.