Richard, I double checked that with and without my fix compiler produces the same output with -fdump-tree-optimized.
What patch did you apply? I think that you should apply the second one - 56175.diff. Yuri. 2013/2/21 Richard Biener <richard.guent...@gmail.com>: > On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Richard, >> >> This regression was introduced by Kai >> >> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >> >> 2011-06-27 Kai Tietz <kti...@redhat.com> >> >> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >> type sinking. >> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >> search for di/si mode patterns for finding widest match. >> >> Is it sufficient for you to accept my patch? > > No, I still don't think the fix is sound. A proper fix would revert the > above change (the simplify_bitwise_binary one), watch for testsuite > fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) > and fix those failures in a better way. > > Richard. > >> Best regards. >> yuri. >> >> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Richard, >>>> >>>> As we know Kai is working on this problem for 4.9 and I assume that >>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>> fix but more common fix will be done. >>> >>> Well, unless you show it is a regression the patch is not applicable for 4.8 >>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>> either, >>> it is after all a canonicalization - fold seems to perform the opposite one >>> though: >>> >>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>> constants (if x has signed type, the sign bit cannot be set >>> in c). This folds extension into the BIT_AND_EXPR. >>> >>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting, >>> not sinking. Generally frontends and fold try to narrow operands when >>> possible (even though some targets later widen them again because of >>> instruction set constraints). >>> >>> Most of this hoisting code was done to make lowering logical && and || >>> I believe. Looking at the testcases added tells us that while matching >>> the two patterns as done now helps them but only because that pattern >>> feeds single-operand instructions that then simplify. So doing the >>> transform >>> starting from that single-operand instructions instead looks like a better >>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>> canonicalization done by fold. >>> >>> Richard. >>> >>>> I also can propose to introduce new hook for it but need to know your >>>> opinion since we don't went to waste our time on preparing dead >>>> patches. Note that x86 supports all short types in HW and such type >>>> sinkning is usually useless if short types are involved. >>>> >>>> Best regards. >>>> Yuri. >>>> >>>> >>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Richard, >>>>>> >>>>>> First of all, your proposal to move type sinking to the end of >>>>>> function does not work since we handle each statement in function and >>>>>> we want that 1st type folding of X & C will not happen. >>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>> >>>>>> x.0_10 = (signed char) x_8; >>>>>> _11 = x.0_10 & 1; >>>>>> _12 = (signed char) y_9; >>>>>> _13 = _12 & 1; >>>>>> _14 = _11 ^ _13; >>>>> >>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>> forwprop into a forward and backward stage. Of course then you have >>>>> the ordering issue of whether to first forward or backward. >>>>> >>>>> Which means that I bet you can construct a testcase that with >>>>> your change is no longer optimized (just make pushing the conversion >>>>> make the types _match_). Which is always the case >>>>> with this kind of local pattern-matching transforms. >>>>> >>>>> Currently forwprop processes leafs of expression trees first (well, inside >>>>> a basic-block), similar to how fold () is supposed to be operated, based >>>>> on the idea that simplified / canonicalized leafs helps keeping pattern >>>>> recognition simple and cost considerations more accurate. >>>>> >>>>> When one order works better than another you always have to consider >>>>> that the user could already have written the code in a way that results >>>>> in the input that isn't well handled. >>>>> >>>>> Not that this helps very much for the situation ;) >>>>> >>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>> an improvement but just a hack for the benchmark. >>>>> >>>>> Richard. >>>>> >>>>>> I also added comment to my fix and create new test for it. I also >>>>>> checked that this test is passed with patched compiler only. So >>>>>> Change Log was also modified: >>>>>> >>>>>> ChangeLog >>>>>> >>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> PR tree-optimization/56175 >>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>> sinking >>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & >>>>>> C >>>>>> for short integer types. >>>>>> * gcc.dg/pr56175.c: New test. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>> The fix is very simple - we simply turn off short type sinking at the >>>>>>>> first pass of forward propagation allows to get >>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>> +5-7% for other x86 platforms too. >>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>> >>>>>>>> Is it Ok for trunk? >>>>>>> >>>>>>> It definitely needs a comment before the checks. >>>>>>> >>>>>>> Also I think it simply shows that the code is placed at the wrong spot. >>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very >>>>>>> last >>>>>>> should get both of the effects done. >>>>>>> >>>>>>> Can you rework the patch according to that? >>>>>>> >>>>>>> You also miss a testcase, we should make sure to not regress again here. >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> ChangeLog. >>>>>>>> >>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> PR tree-optimization/56175 >>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type >>>>>>>> sinking >>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) >>>>>>>> & C >>>>>>>> for short integer types.