On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > I double checked that with and without my fix compiler produces the > same output with -fdump-tree-optimized.
For what testcase? Richard. > What patch did you apply? I think that you should apply the second one > - 56175.diff. > > Yuri. > > 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> This regression was introduced by Kai >>> >>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html >>> >>> 2011-06-27 Kai Tietz <kti...@redhat.com> >>> >>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve >>> type sinking. >>> * tree-ssa-math-opts.c (execute_optimize_bswap): Separate >>> search for di/si mode patterns for finding widest match. >>> >>> Is it sufficient for you to accept my patch? >> >> No, I still don't think the fix is sound. A proper fix would revert the >> above change (the simplify_bitwise_binary one), watch for testsuite >> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing) >> and fix those failures in a better way. >> >> Richard. >> >>> Best regards. >>> yuri. >>> >>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> As we know Kai is working on this problem for 4.9 and I assume that >>>>> type sinking will be deleted from forwprop pass. Could we stay on this >>>>> fix but more common fix will be done. >>>> >>>> Well, unless you show it is a regression the patch is not applicable for >>>> 4.8 >>>> anyway. Not sure if the code will be deleted from forwprop pass in 4.9 >>>> either, >>>> it is after all a canonicalization - fold seems to perform the opposite one >>>> though: >>>> >>>> /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer >>>> constants (if x has signed type, the sign bit cannot be set >>>> in c). This folds extension into the BIT_AND_EXPR. >>>> >>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type >>>> hoisting, >>>> not sinking. Generally frontends and fold try to narrow operands when >>>> possible (even though some targets later widen them again because of >>>> instruction set constraints). >>>> >>>> Most of this hoisting code was done to make lowering logical && and || >>>> I believe. Looking at the testcases added tells us that while matching >>>> the two patterns as done now helps them but only because that pattern >>>> feeds single-operand instructions that then simplify. So doing the >>>> transform >>>> starting from that single-operand instructions instead looks like a better >>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the >>>> canonicalization done by fold. >>>> >>>> Richard. >>>> >>>>> I also can propose to introduce new hook for it but need to know your >>>>> opinion since we don't went to waste our time on preparing dead >>>>> patches. Note that x86 supports all short types in HW and such type >>>>> sinkning is usually useless if short types are involved. >>>>> >>>>> Best regards. >>>>> Yuri. >>>>> >>>>> >>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> First of all, your proposal to move type sinking to the end of >>>>>>> function does not work since we handle each statement in function and >>>>>>> we want that 1st type folding of X & C will not happen. >>>>>>> Note that we have the following sequence of gimple before forwprop1: >>>>>>> >>>>>>> x.0_10 = (signed char) x_8; >>>>>>> _11 = x.0_10 & 1; >>>>>>> _12 = (signed char) y_9; >>>>>>> _13 = _12 & 1; >>>>>>> _14 = _11 ^ _13; >>>>>> >>>>>> Ah, indeed. Reminds me of some of my dead patches that separated >>>>>> forwprop into a forward and backward stage. Of course then you have >>>>>> the ordering issue of whether to first forward or backward. >>>>>> >>>>>> Which means that I bet you can construct a testcase that with >>>>>> your change is no longer optimized (just make pushing the conversion >>>>>> make the types _match_). Which is always the case >>>>>> with this kind of local pattern-matching transforms. >>>>>> >>>>>> Currently forwprop processes leafs of expression trees first (well, >>>>>> inside >>>>>> a basic-block), similar to how fold () is supposed to be operated, based >>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern >>>>>> recognition simple and cost considerations more accurate. >>>>>> >>>>>> When one order works better than another you always have to consider >>>>>> that the user could already have written the code in a way that results >>>>>> in the input that isn't well handled. >>>>>> >>>>>> Not that this helps very much for the situation ;) >>>>>> >>>>>> But I don't like the use of first_pass_instance ... and the fix isn't >>>>>> an improvement but just a hack for the benchmark. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> I also added comment to my fix and create new test for it. I also >>>>>>> checked that this test is passed with patched compiler only. So >>>>>>> Change Log was also modified: >>>>>>> >>>>>>> ChangeLog >>>>>>> >>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> PR tree-optimization/56175 >>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type >>>>>>> sinking >>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) >>>>>>> & C >>>>>>> for short integer types. >>>>>>> * gcc.dg/pr56175.c: New test. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C >>>>>>>>> pattern in simpify_bitwise_binary for short integer types. >>>>>>>>> The fix is very simple - we simply turn off short type sinking at the >>>>>>>>> first pass of forward propagation allows to get >>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and >>>>>>>>> +5-7% for other x86 platforms too. >>>>>>>>> Bootstrapping and regression testing were successful on x86-64. >>>>>>>>> >>>>>>>>> Is it Ok for trunk? >>>>>>>> >>>>>>>> It definitely needs a comment before the checks. >>>>>>>> >>>>>>>> Also I think it simply shows that the code is placed at the wrong spot. >>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very >>>>>>>> last >>>>>>>> should get both of the effects done. >>>>>>>> >>>>>>>> Can you rework the patch according to that? >>>>>>>> >>>>>>>> You also miss a testcase, we should make sure to not regress again >>>>>>>> here. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>>> ChangeLog. >>>>>>>>> >>>>>>>>> 2013-02-20 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> PR tree-optimization/56175 >>>>>>>>> * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type >>>>>>>>> sinking >>>>>>>>> at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ >>>>>>>>> B) & C >>>>>>>>> for short integer types.