Richard,

I double checked that with and without my fix compiler produces the
same output with -fdump-tree-optimized.

What patch did you apply? I think that you should apply the second one
- 56175.diff.

Yuri.

2013/2/21 Richard Biener <richard.guent...@gmail.com>:
> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>> Richard,
>>
>> This regression was introduced by Kai
>>
>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>
>> 2011-06-27  Kai Tietz  <kti...@redhat.com>
>>
>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>         type sinking.
>>         * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>         search for di/si mode patterns for finding widest match.
>>
>> Is it sufficient for you to accept my patch?
>
> No, I still don't think the fix is sound.  A proper fix would revert the
> above change (the simplify_bitwise_binary one), watch for testsuite
> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
> and fix those failures in a better way.
>
> Richard.
>
>> Best regards.
>> yuri.
>>
>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>> Richard,
>>>>
>>>> As we know Kai is working on this problem for 4.9 and I assume that
>>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>>> fix but more common fix will be done.
>>>
>>> Well, unless you show it is a regression the patch is not applicable for 4.8
>>> anyway.  Not sure if the code will be deleted from forwprop pass in 4.9 
>>> either,
>>> it is after all a canonicalization - fold seems to perform the opposite one
>>> though:
>>>
>>>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>          constants (if x has signed type, the sign bit cannot be set
>>>          in c).  This folds extension into the BIT_AND_EXPR.
>>>
>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type hoisting,
>>> not sinking.  Generally frontends and fold try to narrow operands when
>>> possible (even though some targets later widen them again because of
>>> instruction set constraints).
>>>
>>> Most of this hoisting code was done to make lowering logical && and ||
>>> I believe.  Looking at the testcases added tells us that while matching
>>> the two patterns as done now helps them but only because that pattern
>>> feeds single-operand instructions that then simplify.  So doing the 
>>> transform
>>> starting from that single-operand instructions instead looks like a better
>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>> canonicalization done by fold.
>>>
>>> Richard.
>>>
>>>> I also can propose to introduce new hook for it but need to know your
>>>> opinion since we don't went to waste our time on preparing dead
>>>> patches. Note that x86 supports all short types in HW and such type
>>>> sinkning is usually useless if short types are involved.
>>>>
>>>> Best regards.
>>>> Yuri.
>>>>
>>>>
>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>> wrote:
>>>>>> Richard,
>>>>>>
>>>>>> First of all, your proposal to move type sinking to the end of
>>>>>> function does not work since we handle each statement in function and
>>>>>> we want that 1st type folding of X & C will not happen.
>>>>>> Note that we have the following sequence of gimple before forwprop1:
>>>>>>
>>>>>>    x.0_10 = (signed char) x_8;
>>>>>>   _11 = x.0_10 & 1;
>>>>>>   _12 = (signed char) y_9;
>>>>>>   _13 = _12 & 1;
>>>>>>   _14 = _11 ^ _13;
>>>>>
>>>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>>>> forwprop into a forward and backward stage.  Of course then you have
>>>>> the ordering issue of whether to first forward or backward.
>>>>>
>>>>> Which means that I bet you can construct a testcase that with
>>>>> your change is no longer optimized (just make pushing the conversion
>>>>> make the types _match_).  Which is always the case
>>>>> with this kind of local pattern-matching transforms.
>>>>>
>>>>> Currently forwprop processes leafs of expression trees first (well, inside
>>>>> a basic-block), similar to how fold () is supposed to be operated, based
>>>>> on the idea that simplified / canonicalized leafs helps keeping pattern
>>>>> recognition simple and cost considerations more accurate.
>>>>>
>>>>> When one order works better than another you always have to consider
>>>>> that the user could already have written the code in a way that results
>>>>> in the input that isn't well handled.
>>>>>
>>>>> Not that this helps very much for the situation ;)
>>>>>
>>>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>>>> an improvement but just a hack for the benchmark.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> I also added comment to my fix and create new test for it. I also
>>>>>> checked that this test is passed with patched compiler  only. So
>>>>>> Change Log was also modified:
>>>>>>
>>>>>> ChangeLog
>>>>>>
>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>>         PR tree-optimization/56175
>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type 
>>>>>> sinking
>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) & 
>>>>>> C
>>>>>>         for short integer types.
>>>>>>         * gcc.dg/pr56175.c: New test.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>>>> The fix is very simple - we simply turn off short type sinking at the
>>>>>>>> first pass of forward propagation allows to get
>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>>>>>>> +5-7% for other x86 platforms too.
>>>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>>>
>>>>>>>> Is it Ok for trunk?
>>>>>>>
>>>>>>> It definitely needs a comment before the checks.
>>>>>>>
>>>>>>> Also I think it simply shows that the code is placed at the wrong spot.
>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the very 
>>>>>>> last
>>>>>>> should get both of the effects done.
>>>>>>>
>>>>>>> Can you rework the patch according to that?
>>>>>>>
>>>>>>> You also miss a testcase, we should make sure to not regress again here.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> ChangeLog.
>>>>>>>>
>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>
>>>>>>>>         PR tree-optimization/56175
>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid type 
>>>>>>>> sinking
>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ B) 
>>>>>>>> & C
>>>>>>>>         for short integer types.

Reply via email to