On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > Sorry for my previous message - I did not undersatnd it properly. > > Anyway I proposed another fix that avoid (type) x & c --> (type) (x & > (type-x) c) transformation if x has short type: > > +++ gcc/tree-ssa-forwprop.c (working copy) > @@ -1789,8 +1789,11 @@ > defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2); > defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2); > > - /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). */ > + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)). > + Do that only if X has not short type. */ > if (TREE_CODE (arg2) == INTEGER_CST > + && (TYPE_PRECISION (TREE_TYPE (arg1)) > + >= TYPE_PRECISION (integer_type_node)) > && CONVERT_EXPR_CODE_P (def1_code) > && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) > && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) > > Does this fix look suitable?
I think the fix is to disable the transform if it widens the operation. Thus, sth like if (TREE_CODE (arg2) == INTEGER_CST && CONVERT_EXPR_CODE_P (def1_code) && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1)) + && (TYPE_PRECISION (TREE_TYPE (arg1)) + >= TYPE_PRECISION (TREE_TYPE (def1_arg1))) && int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) see the restriction we place on the transform for the (T1) x & (T2) y case: /* For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. */ if (CONVERT_EXPR_CODE_P (def1_code) && CONVERT_EXPR_CODE_P (def2_code) && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) /* Make sure that the conversion widens the operands, or has same precision, or that it changes the operation to a bitfield precision. */ && ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) <= TYPE_PRECISION (TREE_TYPE (arg1))) || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) != MODE_INT) || (TYPE_PRECISION (TREE_TYPE (arg1)) != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1)))))) ideally you'd split out that condition into a predicate function taking the two type and use it in both places. Richard. > Yuri. > > 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I double checked that with and without my fix compiler produces the >>> same output with -fdump-tree-optimized. >> >> For what testcase? >> >> Richard. >> >>> What patch did you apply? I think that you should apply the second one >>> - 56175.diff. >>> >>> Yuri. >>> >>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> This regression was introduced by Kai >>>>> >>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>>>> >>>>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>>>> >>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>>>> type sinking. >>>>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>>>> search for di/si mode patterns for finding widest match. >>>>> >>>>> Is it sufficient for you to accept my patch? >>>> >>>> No, I still don't think the fix is sound. A proper fix would revert the >>>> above change (the simplify_bitwise_binary one), watch for testsuite >>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >>>> and fix those failures in a better way. >>>> >>>> Richard. >>>> >>>>> Best regards. >>>>> yuri. >>>>> >>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>>>>> fix but more common fix will be done. >>>>>> >>>>>> Well, unless you show it is a regression the patch is not applicable for >>>>>> 4.8 >>>>>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>>>>> either, >>>>>> it is after all a canonicalization - fold seems to perform the opposite >>>>>> one >>>>>> though: >>>>>> >>>>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>>>> constants (if x has signed type, the sign bit cannot be set >>>>>> in c). This folds extension into the BIT_AND_EXPR. >>>>>> >>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>>>> hoisting, >>>>>> not sinking. Generally frontends and fold try to narrow operands when >>>>>> possible (even though some targets later widen them again because of >>>>>> instruction set constraints). >>>>>> >>>>>> Most of this hoisting code was done to make lowering logical && and || >>>>>> I believe. Looking at the testcases added tells us that while matching >>>>>> the two patterns as done now helps them but only because that pattern >>>>>> feeds single-operand instructions that then simplify. So doing the >>>>>> transform >>>>>> starting from that single-operand instructions instead looks like a >>>>>> better >>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>>>> canonicalization done by fold. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> I also can propose to introduce new hook for it but need to know your >>>>>>> opinion since we don't went to waste our time on preparing dead >>>>>>> patches. Note that x86 supports all short types in HW and such type >>>>>>> sinkning is usually useless if short types are involved. >>>>>>> >>>>>>> Best regards. >>>>>>> Yuri. >>>>>>> >>>>>>> >>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>>>> function does not work since we handle each statement in function and >>>>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>>>>> >>>>>>>>> x.0_10 = (signed char) x_8; >>>>>>>>> _11 = x.0_10 & 1; >>>>>>>>> _12 = (signed char) y_9; >>>>>>>>> _13 = _12 & 1; >>>>>>>>> _14 = _11 ^ _13; >>>>>>>> >>>>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>>>> the ordering issue of whether to first forward or backward. >>>>>>>> >>>>>>>> Which means that I bet you can construct a testcase that with >>>>>>>> your change is no longer optimized (just make pushing the conversion >>>>>>>> make the types _match_). Which is always the case >>>>>>>> with this kind of local pattern-matching transforms. >>>>>>>> >>>>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>>>> inside >>>>>>>> a basic-block), similar to how fold () is supposed to be operated, >>>>>>>> based >>>>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern >>>>>>>> recognition simple and cost considerations more accurate. >>>>>>>> >>>>>>>> When one order works better than another you always have to consider >>>>>>>> that the user could already have written the code in a way that results >>>>>>>> in the input that isn't well handled. >>>>>>>> >>>>>>>> Not that this helps very much for the situation ;) >>>>>>>> >>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>>>>> an improvement but just a hack for the benchmark. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>>>> checked that this test is passed with patched compiler only. So >>>>>>>>> Change Log was also modified: >>>>>>>>> >>>>>>>>> ChangeLog >>>>>>>>> >>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> PR tree-optimization/56175 >>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>>>>> sinking >>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>> B) & C >>>>>>>>> for short integer types. >>>>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at >>>>>>>>>>> the >>>>>>>>>>> first pass of forward propagation allows to get >>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>>>> >>>>>>>>>>> Is it Ok for trunk? >>>>>>>>>> >>>>>>>>>> It definitely needs a comment before the checks. >>>>>>>>>> >>>>>>>>>> Also I think it simply shows that the code is placed at the wrong >>>>>>>>>> spot. >>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very >>>>>>>>>> last >>>>>>>>>> should get both of the effects done. >>>>>>>>>> >>>>>>>>>> Can you rework the patch according to that? >>>>>>>>>> >>>>>>>>>> You also miss a testcase, we should make sure to not regress again >>>>>>>>>> here. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> ChangeLog. >>>>>>>>>>> >>>>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> PR tree-optimization/56175 >>>>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid >>>>>>>>>>> type sinking >>>>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>>>> B) & C >>>>>>>>>>> for short integer types.