On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > This does not fit to my fix since if we have another pattern like > t = (u8)((x & 1) ^ ((u8)y & 1)); > where y has short type, for rhs operand type sinkning (or hoisting as > you prefer) still will be performed but I don't see any reason for > converting (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all > targets including x86.
As I said we shouldn't do that and the tests I suggested would make sure we don't. Even if I mistyped them. Put in the same condition as in the (type) X op (type) Y hoisting which is exactly there to avoid widening the operands to 'op' with the hoisting. I can produce a patch later today. Richard. > Yuri. > > 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> Sorry for my previous message - I did not undersatnd it properly. >>> >>> Anyway I proposed another fix that avoid (type) x & c --> (type) (x & >>> (type-x) c) transformation if x has short type: >>> >>> +++ gcc/tree-ssa-forwprop.c (working copy) >>> @@ -1789,8 +1789,11 @@ >>> defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >>> defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); >>> >>> - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ >>> + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). >>> + Do that only if X has not short type. */ >>> if (TREE_CODE (arg2) == INTEGER_CST >>> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >>> + >= TYPE_PRECISION (integer_type_node)) >>> && CONVERT_EXPR_CODE_P (def1_code) >>> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >>> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >>> >>> Does this fix look suitable? >> >> I think the fix is to disable the transform if it widens the operation. >> Thus, sth like >> >> if (TREE_CODE (arg2) == INTEGER_CST >> && CONVERT_EXPR_CODE_P (def1_code) >> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >> + >= TYPE_PRECISION (TREE_TYPE (def1_arg1))) >> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >> >> see the restriction we place on the transform for the (T1) x & (T2) y case: >> >> /* For bitwise binary operations apply operand conversions to the >> binary operation result instead of to the operands. This allows >> to combine successive conversions and bitwise binary operations. */ >> if (CONVERT_EXPR_CODE_P (def1_code) >> && CONVERT_EXPR_CODE_P (def2_code) >> && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) >> /* Make sure that the conversion widens the operands, or has same >> precision, or that it changes the operation to a bitfield >> precision. */ >> && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) >> <= TYPE_PRECISION (TREE_TYPE (arg1))) >> || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) >> != MODE_INT) >> || (TYPE_PRECISION (TREE_TYPE (arg1)) >> != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)))))) >> >> ideally you'd split out that condition into a predicate function taking the >> two type and use it in both places. >> >> Richard. >> >>> Yuri. >>> >>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> I double checked that with and without my fix compiler produces the >>>>> same output with -fdump-tree-optimized. >>>> >>>> For what testcase? >>>> >>>> Richard. >>>> >>>>> What patch did you apply? I think that you should apply the second one >>>>> - 56175.diff. >>>>> >>>>> Yuri. >>>>> >>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> This regression was introduced by Kai >>>>>>> >>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>>>>> >>>>>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>>>>> >>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>>>>> type sinking. >>>>>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>>>>> search for di/si mode patterns for finding widest match. >>>>>>> >>>>>>> Is it sufficient for you to accept my patch? >>>>>> >>>>>> No, I still don't think the fix is sound. A proper fix would revert the >>>>>> above change (the simplify_bitwise_binary one), watch for testsuite >>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>>>>> and fix those failures in a better way. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Best regards. >>>>>>> yuri. >>>>>>> >>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>>>>>>> fix but more common fix will be done. >>>>>>>> >>>>>>>> Well, unless you show it is a regression the patch is not applicable >>>>>>>> for 4.8 >>>>>>>> anyway. Not sure if the code will be deleted from forwprop pass in >>>>>>>> 4.9 either, >>>>>>>> it is after all a canonicalization - fold seems to perform the >>>>>>>> opposite one >>>>>>>> though: >>>>>>>> >>>>>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>>>>> constants (if x has signed type, the sign bit cannot be set >>>>>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>>>>> >>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>>>>> hoisting, >>>>>>>> not sinking. Generally frontends and fold try to narrow operands when >>>>>>>> possible (even though some targets later widen them again because of >>>>>>>> instruction set constraints). >>>>>>>> >>>>>>>> Most of this hoisting code was done to make lowering logical && and || >>>>>>>> I believe. Looking at the testcases added tells us that while matching >>>>>>>> the two patterns as done now helps them but only because that pattern >>>>>>>> feeds single-operand instructions that then simplify. So doing the >>>>>>>> transform >>>>>>>> starting from that single-operand instructions instead looks like a >>>>>>>> better >>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>>>>> canonicalization done by fold. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> I also can propose to introduce new hook for it but need to know your >>>>>>>>> opinion since we don't went to waste our time on preparing dead >>>>>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>>>>> sinkning is usually useless if short types are involved. >>>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>>>>> function does not work since we handle each statement in function >>>>>>>>>>> and >>>>>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>>>>>>> >>>>>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>>>>> _11 = x.0_10 & 1; >>>>>>>>>>> _12 = (signed char) y_9; >>>>>>>>>>> _13 = _12 & 1; >>>>>>>>>>> _14 = _11 ^ _13; >>>>>>>>>> >>>>>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>>>>>> the ordering issue of whether to first forward or backward. >>>>>>>>>> >>>>>>>>>> Which means that I bet you can construct a testcase that with >>>>>>>>>> your change is no longer optimized (just make pushing the conversion >>>>>>>>>> make the types _match_). Which is always the case >>>>>>>>>> with this kind of local pattern-matching transforms. >>>>>>>>>> >>>>>>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>>>>>> inside >>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, >>>>>>>>>> based >>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping >>>>>>>>>> pattern >>>>>>>>>> recognition simple and cost considerations more accurate. >>>>>>>>>> >>>>>>>>>> When one order works better than another you always have to consider >>>>>>>>>> that the user could already have written the code in a way that >>>>>>>>>> results >>>>>>>>>> in the input that isn't well handled. >>>>>>>>>> >>>>>>>>>> Not that this helps very much for the situation ;) >>>>>>>>>> >>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>>>>>>> an improvement but just a hack for the benchmark. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>>>>> Change Log was also modified: >>>>>>>>>>> >>>>>>>>>>> ChangeLog >>>>>>>>>>> >>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>>>>>>> sinking >>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>>>> B) & C >>>>>>>>>>> for short integer types. >>>>>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Hi All, >>>>>>>>>>>>> >>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at >>>>>>>>>>>>> the >>>>>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>>>>> >>>>>>>>>>>>> Is it Ok for trunk? >>>>>>>>>>>> >>>>>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>>>>> >>>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong >>>>>>>>>>>> spot. >>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the >>>>>>>>>>>> very last >>>>>>>>>>>> should get both of the effects done. >>>>>>>>>>>> >>>>>>>>>>>> Can you rework the patch according to that? >>>>>>>>>>>> >>>>>>>>>>>> You also miss a testcase, we should make sure to not regress again >>>>>>>>>>>> here. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> ChangeLog. >>>>>>>>>>>>> >>>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>> >>>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid >>>>>>>>>>>>> type sinking >>>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A >>>>>>>>>>>>> ^ B) & C >>>>>>>>>>>>> for short integer types.