Richard, This does not fit to my fix since if we have another pattern like t = (u8)((x & 1) ^ ((u8)y & 1)); where y has short type, for rhs operand type sinkning (or hoisting as you prefer) still will be performed but I don't see any reason for converting (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all targets including x86.
Yuri. 2013/2/21 Richard Biener <richard.guent...@gmail.com>: > On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> Sorry for my previous message - I did not undersatnd it properly. >> >> Anyway I proposed another fix that avoid (type) x & c --> (type) (x & >> (type-x) c) transformation if x has short type: >> >> +++ gcc/tree-ssa-forwprop.c (working copy) >> @@ -1789,8 +1789,11 @@ >> defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); >> defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); >> >> - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ >> + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). >> + Do that only if X has not short type. */ >> if (TREE_CODE (arg2) == INTEGER_CST >> + && (TYPE_PRECISION (TREE_TYPE (arg1)) >> + >= TYPE_PRECISION (integer_type_node)) >> && CONVERT_EXPR_CODE_P (def1_code) >> && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) >> && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) >> >> Does this fix look suitable? > > I think the fix is to disable the transform if it widens the operation. > Thus, sth like > > if (TREE_CODE (arg2) == INTEGER_CST > && CONVERT_EXPR_CODE_P (def1_code) > && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) > + && (TYPE_PRECISION (TREE_TYPE (arg1)) > + >= TYPE_PRECISION (TREE_TYPE (def1_arg1))) > && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) > > see the restriction we place on the transform for the (T1) x & (T2) y case: > > /* For bitwise binary operations apply operand conversions to the > binary operation result instead of to the operands. This allows > to combine successive conversions and bitwise binary operations. */ > if (CONVERT_EXPR_CODE_P (def1_code) > && CONVERT_EXPR_CODE_P (def2_code) > && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) > /* Make sure that the conversion widens the operands, or has same > precision, or that it changes the operation to a bitfield > precision. */ > && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) > <= TYPE_PRECISION (TREE_TYPE (arg1))) > || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) > != MODE_INT) > || (TYPE_PRECISION (TREE_TYPE (arg1)) > != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)))))) > > ideally you'd split out that condition into a predicate function taking the > two type and use it in both places. > > Richard. > >> Yuri. >> >> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> I double checked that with and without my fix compiler produces the >>>> same output with -fdump-tree-optimized. >>> >>> For what testcase? >>> >>> Richard. >>> >>>> What patch did you apply? I think that you should apply the second one >>>> - 56175.diff. >>>> >>>> Yuri. >>>> >>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> This regression was introduced by Kai >>>>>> >>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>>>> >>>>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>>>> >>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>>>> type sinking. >>>>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>>>> search for di/si mode patterns for finding widest match. >>>>>> >>>>>> Is it sufficient for you to accept my patch? >>>>> >>>>> No, I still don't think the fix is sound. A proper fix would revert the >>>>> above change (the simplify_bitwise_binary one), watch for testsuite >>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>>>> and fix those failures in a better way. >>>>> >>>>> Richard. >>>>> >>>>>> Best regards. >>>>>> yuri. >>>>>> >>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Richard, >>>>>>>> >>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>>>>>> fix but more common fix will be done. >>>>>>> >>>>>>> Well, unless you show it is a regression the patch is not applicable >>>>>>> for 4.8 >>>>>>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>>>>>> either, >>>>>>> it is after all a canonicalization - fold seems to perform the opposite >>>>>>> one >>>>>>> though: >>>>>>> >>>>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>>>> constants (if x has signed type, the sign bit cannot be set >>>>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>>>> >>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>>>> hoisting, >>>>>>> not sinking. Generally frontends and fold try to narrow operands when >>>>>>> possible (even though some targets later widen them again because of >>>>>>> instruction set constraints). >>>>>>> >>>>>>> Most of this hoisting code was done to make lowering logical && and || >>>>>>> I believe. Looking at the testcases added tells us that while matching >>>>>>> the two patterns as done now helps them but only because that pattern >>>>>>> feeds single-operand instructions that then simplify. So doing the >>>>>>> transform >>>>>>> starting from that single-operand instructions instead looks like a >>>>>>> better >>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>>>> canonicalization done by fold. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> I also can propose to introduce new hook for it but need to know your >>>>>>>> opinion since we don't went to waste our time on preparing dead >>>>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>>>> sinkning is usually useless if short types are involved. >>>>>>>> >>>>>>>> Best regards. >>>>>>>> Yuri. >>>>>>>> >>>>>>>> >>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Richard, >>>>>>>>>> >>>>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>>>> function does not work since we handle each statement in function and >>>>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>>>>>> >>>>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>>>> _11 = x.0_10 & 1; >>>>>>>>>> _12 = (signed char) y_9; >>>>>>>>>> _13 = _12 & 1; >>>>>>>>>> _14 = _11 ^ _13; >>>>>>>>> >>>>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>>>>> the ordering issue of whether to first forward or backward. >>>>>>>>> >>>>>>>>> Which means that I bet you can construct a testcase that with >>>>>>>>> your change is no longer optimized (just make pushing the conversion >>>>>>>>> make the types _match_). Which is always the case >>>>>>>>> with this kind of local pattern-matching transforms. >>>>>>>>> >>>>>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>>>>> inside >>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, >>>>>>>>> based >>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping >>>>>>>>> pattern >>>>>>>>> recognition simple and cost considerations more accurate. >>>>>>>>> >>>>>>>>> When one order works better than another you always have to consider >>>>>>>>> that the user could already have written the code in a way that >>>>>>>>> results >>>>>>>>> in the input that isn't well handled. >>>>>>>>> >>>>>>>>> Not that this helps very much for the situation ;) >>>>>>>>> >>>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>>>>>> an improvement but just a hack for the benchmark. >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>>>> Change Log was also modified: >>>>>>>>>> >>>>>>>>>> ChangeLog >>>>>>>>>> >>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>>>>>> sinking >>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>>> B) & C >>>>>>>>>> for short integer types. >>>>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at >>>>>>>>>>>> the >>>>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>>>> >>>>>>>>>>>> Is it Ok for trunk? >>>>>>>>>>> >>>>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>>>> >>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong >>>>>>>>>>> spot. >>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the >>>>>>>>>>> very last >>>>>>>>>>> should get both of the effects done. >>>>>>>>>>> >>>>>>>>>>> Can you rework the patch according to that? >>>>>>>>>>> >>>>>>>>>>> You also miss a testcase, we should make sure to not regress again >>>>>>>>>>> here. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> ChangeLog. >>>>>>>>>>>> >>>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid >>>>>>>>>>>> type sinking >>>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A >>>>>>>>>>>> ^ B) & C >>>>>>>>>>>> for short integer types.