On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Fri, May 13, 2016 at 5:53 PM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <bin.ch...@arm.com> wrote:
>>>Hi,
>>>As PR69848 reported, GCC vectorizer now generates comparison outside of
>>>VEC_COND_EXPR for COND_REDUCTION case, as below:
>>>
>>>  _20 = vect__1.6_8 != { 0, 0, 0, 0 };
>>>  vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>;
>>>
>>>This results in inefficient expanding.  With IR like:
>>>
>>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0,
>>>0, 0 }, vect_c_2.7_13>;
>>>  _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>;
>>>
>>>We can do:
>>>1) Expanding time optimization, for example, reverting comparison
>>>operator by switching VEC_COND_EXPR operands.  This is useful when
>>>backend only supports some comparison operators.
>>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR
>>>instruction which introduced by expand_vec_cond_expr.
>>>
>>>This patch fixes this by propagating comparison into VEC_COND_EXPR even
>>>if it's used multiple times.  For now, GCC does single_use_only
>>>propagation.  Ideally, we may duplicate the comparison before each use
>>>statement just before expanding, so that TER can successfully backtrack
>>>it from each VEC_COND_EXPR.  Unfortunately I didn't find a good pass to
>>>do this.  Tree-vect-generic.c looks like a good candidate, but it's so
>>>early that following CSE could undo the transform.  Another possible
>>>fix is to generate comparison inside VEC_COND_EXPR directly in function
>>>vectorizable_reduction.
>>
>> I prefer this for now.
> Hi Richard, you mean this patch, or the possible fix before your comment?

The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs
with embedded comparison.

Thanks,
Richard.

> Here is an updated patch addressing comment issue pointed out by
> Bernhard Reutner-Fischer.  Thanks.
>
> Thanks,
> bin
>>
>> Richard.
>>
>>>As for possible comparison CSE opportunities, I checked that it's
>>>simple enough to be handled by RTL CSE.
>>>
>>>Bootstrap and test on x86_64 and AArch64.  Any comments?
>>>
>>>Thanks,
>>>bin
>>>
>>>2016-05-12  Bin Cheng  <bin.ch...@arm.com>
>>>
>>>       PR tree-optimization/69848
>>>       * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New.
>>>       (expand_vec_cmp_expr_p): Call above functions.
>>>       * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New.
>>>       * tree-ssa-forwprop.c (optabs-tree.h): Include header file.
>>>       (forward_propagate_into_cond): Propgate multiple uses for
>>>       VEC_COND_EXPR.
>>
>>

Reply via email to