On Mon, May 16, 2016 at 10:09 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Fri, May 13, 2016 at 5:53 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On May 13, 2016 6:02:27 PM GMT+02:00, Bin Cheng <bin.ch...@arm.com> wrote: >>>Hi, >>>As PR69848 reported, GCC vectorizer now generates comparison outside of >>>VEC_COND_EXPR for COND_REDUCTION case, as below: >>> >>> _20 = vect__1.6_8 != { 0, 0, 0, 0 }; >>> vect_c_2.8_16 = VEC_COND_EXPR <_20, { 0, 0, 0, 0 }, vect_c_2.7_13>; >>> _21 = VEC_COND_EXPR <_20, ivtmp_17, _19>; >>> >>>This results in inefficient expanding. With IR like: >>> >>>vect_c_2.8_16 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, { 0, 0, >>>0, 0 }, vect_c_2.7_13>; >>> _21 = VEC_COND_EXPR <vect__1.6_8 != { 0, 0, 0, 0 }, ivtmp_17, _19>; >>> >>>We can do: >>>1) Expanding time optimization, for example, reverting comparison >>>operator by switching VEC_COND_EXPR operands. This is useful when >>>backend only supports some comparison operators. >>>2) For backend not supporting vcond_mask patterns, saving one LT_EXPR >>>instruction which introduced by expand_vec_cond_expr. >>> >>>This patch fixes this by propagating comparison into VEC_COND_EXPR even >>>if it's used multiple times. For now, GCC does single_use_only >>>propagation. Ideally, we may duplicate the comparison before each use >>>statement just before expanding, so that TER can successfully backtrack >>>it from each VEC_COND_EXPR. Unfortunately I didn't find a good pass to >>>do this. Tree-vect-generic.c looks like a good candidate, but it's so >>>early that following CSE could undo the transform. Another possible >>>fix is to generate comparison inside VEC_COND_EXPR directly in function >>>vectorizable_reduction. >> >> I prefer this for now. > Hi Richard, you mean this patch, or the possible fix before your comment?
The possible fix before my comment - make the vectorizer generate VEC_COND_EXPRs with embedded comparison. Thanks, Richard. > Here is an updated patch addressing comment issue pointed out by > Bernhard Reutner-Fischer. Thanks. > > Thanks, > bin >> >> Richard. >> >>>As for possible comparison CSE opportunities, I checked that it's >>>simple enough to be handled by RTL CSE. >>> >>>Bootstrap and test on x86_64 and AArch64. Any comments? >>> >>>Thanks, >>>bin >>> >>>2016-05-12 Bin Cheng <bin.ch...@arm.com> >>> >>> PR tree-optimization/69848 >>> * optabs-tree.c (expand_vcond_mask_p, expand_vcond_p): New. >>> (expand_vec_cmp_expr_p): Call above functions. >>> * optabs-tree.h (expand_vcond_mask_p, expand_vcond_p): New. >>> * tree-ssa-forwprop.c (optabs-tree.h): Include header file. >>> (forward_propagate_into_cond): Propgate multiple uses for >>> VEC_COND_EXPR. >> >>