Re: Fix for 56175

Richard Biener Mon, 25 Feb 2013 01:31:55 -0800

On Thu, Feb 21, 2013 at 4:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Richard,
>
> This does not fit to my fix  since if we have another pattern like
>     t = (u8)((x & 1) ^ ((u8)y & 1));
> where y has short type, for rhs operand type sinkning (or hoisting as
> you prefer) still will be performed but I don't see any reason for
> converting  (u8)y & 1 --> (u8)(y & 1) if y has u16 type for all
> targets including x86.


As I said we shouldn't do that and the tests I suggested would make sure
we don't.  Even if I mistyped them.  Put in the same condition as in the
(type) X op (type) Y hoisting which is exactly there to avoid widening
the operands to 'op' with the hoisting.

I can produce a patch later today.

Richard.

> Yuri.
>
> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>> On Thu, Feb 21, 2013 at 4:16 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> Sorry for my previous message - I did not undersatnd it properly.
>>>
>>> Anyway I proposed another fix that avoid  (type) x & c --> (type) (x &
>>> (type-x) c) transformation if x has short type:
>>>
>>> +++ gcc/tree-ssa-forwprop.c     (working copy)
>>> @@ -1789,8 +1789,11 @@
>>>    defcodefor_name (arg1, &def1_code, &def1_arg1, &def1_arg2);
>>>    defcodefor_name (arg2, &def2_code, &def2_arg1, &def2_arg2);
>>>
>>> -  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).  */
>>> +  /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)).
>>> +     Do that only if X has not short type.  */
>>>    if (TREE_CODE (arg2) == INTEGER_CST
>>> +      && (TYPE_PRECISION (TREE_TYPE (arg1))
>>> +          >= TYPE_PRECISION (integer_type_node))
>>>        && CONVERT_EXPR_CODE_P (def1_code)
>>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>>
>>> Does this fix look suitable?
>>
>> I think the fix is to disable the transform if it widens the operation.
>> Thus, sth like
>>
>>     if (TREE_CODE (arg2) == INTEGER_CST
>>        && CONVERT_EXPR_CODE_P (def1_code)
>>        && INTEGRAL_TYPE_P (TREE_TYPE (def1_arg1))
>> +     && (TYPE_PRECISION (TREE_TYPE (arg1))
>> +            >= TYPE_PRECISION (TREE_TYPE (def1_arg1)))
>>        && int_fits_type_p (arg2, TREE_TYPE (def1_arg1)))
>>
>> see the restriction we place on the transform for the (T1) x & (T2) y case:
>>
>>   /* For bitwise binary operations apply operand conversions to the
>>      binary operation result instead of to the operands.  This allows
>>      to combine successive conversions and bitwise binary operations.  */
>>   if (CONVERT_EXPR_CODE_P (def1_code)
>>       && CONVERT_EXPR_CODE_P (def2_code)
>>       && types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1))
>>       /* Make sure that the conversion widens the operands, or has same
>>          precision,  or that it changes the operation to a bitfield
>>          precision.  */
>>       && ((TYPE_PRECISION (TREE_TYPE (def1_arg1))
>>            <= TYPE_PRECISION (TREE_TYPE (arg1)))
>>           || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1)))
>>               != MODE_INT)
>>           || (TYPE_PRECISION (TREE_TYPE (arg1))
>>               != GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (arg1))))))
>>
>> ideally you'd split out that condition into a predicate function taking the
>> two type and use it in both places.
>>
>> Richard.
>>
>>> Yuri.
>>>
>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>> On Thu, Feb 21, 2013 at 3:26 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Richard,
>>>>>
>>>>> I double checked that with and without my fix compiler produces the
>>>>> same output with -fdump-tree-optimized.
>>>>
>>>> For what testcase?
>>>>
>>>> Richard.
>>>>
>>>>> What patch did you apply? I think that you should apply the second one
>>>>> - 56175.diff.
>>>>>
>>>>> Yuri.
>>>>>
>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Thu, Feb 21, 2013 at 2:41 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> This regression was introduced by Kai
>>>>>>>
>>>>>>> http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01988.html
>>>>>>>
>>>>>>> 2011-06-27  Kai Tietz  <kti...@redhat.com>
>>>>>>>
>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve
>>>>>>>         type sinking.
>>>>>>>         * tree-ssa-math-opts.c (execute_optimize_bswap): Separate
>>>>>>>         search for di/si mode patterns for finding widest match.
>>>>>>>
>>>>>>> Is it sufficient for you to accept my patch?
>>>>>>
>>>>>> No, I still don't think the fix is sound.  A proper fix would revert the
>>>>>> above change (the simplify_bitwise_binary one), watch for testsuite
>>>>>> fallout (I can immediately see gcc.dg/tree-ssa/bitwise-sink.c failing)
>>>>>> and fix those failures in a better way.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Best regards.
>>>>>>> yuri.
>>>>>>>
>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Thu, Feb 21, 2013 at 1:39 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> As we know Kai is working on this problem for 4.9 and I assume that
>>>>>>>>> type sinking will be deleted from forwprop pass. Could we stay on this
>>>>>>>>> fix but more common fix will be done.
>>>>>>>>
>>>>>>>> Well, unless you show it is a regression the patch is not applicable 
>>>>>>>> for 4.8
>>>>>>>> anyway.  Not sure if the code will be deleted from forwprop pass in 
>>>>>>>> 4.9 either,
>>>>>>>> it is after all a canonicalization - fold seems to perform the 
>>>>>>>> opposite one
>>>>>>>> though:
>>>>>>>>
>>>>>>>>       /* Convert (T)(x & c) into (T)x & (T)c, if c is an integer
>>>>>>>>          constants (if x has signed type, the sign bit cannot be set
>>>>>>>>          in c).  This folds extension into the BIT_AND_EXPR.
>>>>>>>>
>>>>>>>> note that what forwprop does (T)x & c -> (T)(x & c') I'd call type 
>>>>>>>> hoisting,
>>>>>>>> not sinking.  Generally frontends and fold try to narrow operands when
>>>>>>>> possible (even though some targets later widen them again because of
>>>>>>>> instruction set constraints).
>>>>>>>>
>>>>>>>> Most of this hoisting code was done to make lowering logical && and ||
>>>>>>>> I believe.  Looking at the testcases added tells us that while matching
>>>>>>>> the two patterns as done now helps them but only because that pattern
>>>>>>>> feeds single-operand instructions that then simplify.  So doing the 
>>>>>>>> transform
>>>>>>>> starting from that single-operand instructions instead looks like a 
>>>>>>>> better
>>>>>>>> fix (op !=/== 0/1 and (T) op) and also would not disagree with the
>>>>>>>> canonicalization done by fold.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> I also can propose to introduce new hook for it but need to know your
>>>>>>>>> opinion since we don't went to waste our time on preparing dead
>>>>>>>>> patches. Note that x86 supports all short types in HW and such type
>>>>>>>>> sinkning is usually useless if short types are involved.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/2/21 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>>> On Wed, Feb 20, 2013 at 4:41 PM, Yuri Rumyantsev 
>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> First of all, your proposal to move type sinking to the end of
>>>>>>>>>>> function does not work since we handle each statement in function 
>>>>>>>>>>> and
>>>>>>>>>>> we want that 1st type folding of X & C will not happen.
>>>>>>>>>>> Note that we have the following sequence of gimple before forwprop1:
>>>>>>>>>>>
>>>>>>>>>>>    x.0_10 = (signed char) x_8;
>>>>>>>>>>>   _11 = x.0_10 & 1;
>>>>>>>>>>>   _12 = (signed char) y_9;
>>>>>>>>>>>   _13 = _12 & 1;
>>>>>>>>>>>   _14 = _11 ^ _13;
>>>>>>>>>>
>>>>>>>>>> Ah, indeed.  Reminds me of some of my dead patches that separated
>>>>>>>>>> forwprop into a forward and backward stage.  Of course then you have
>>>>>>>>>> the ordering issue of whether to first forward or backward.
>>>>>>>>>>
>>>>>>>>>> Which means that I bet you can construct a testcase that with
>>>>>>>>>> your change is no longer optimized (just make pushing the conversion
>>>>>>>>>> make the types _match_).  Which is always the case
>>>>>>>>>> with this kind of local pattern-matching transforms.
>>>>>>>>>>
>>>>>>>>>> Currently forwprop processes leafs of expression trees first (well, 
>>>>>>>>>> inside
>>>>>>>>>> a basic-block), similar to how fold () is supposed to be operated, 
>>>>>>>>>> based
>>>>>>>>>> on the idea that simplified / canonicalized leafs helps keeping 
>>>>>>>>>> pattern
>>>>>>>>>> recognition simple and cost considerations more accurate.
>>>>>>>>>>
>>>>>>>>>> When one order works better than another you always have to consider
>>>>>>>>>> that the user could already have written the code in a way that 
>>>>>>>>>> results
>>>>>>>>>> in the input that isn't well handled.
>>>>>>>>>>
>>>>>>>>>> Not that this helps very much for the situation ;)
>>>>>>>>>>
>>>>>>>>>> But I don't like the use of first_pass_instance ... and the fix isn't
>>>>>>>>>> an improvement but just a hack for the benchmark.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> I also added comment to my fix and create new test for it. I also
>>>>>>>>>>> checked that this test is passed with patched compiler  only. So
>>>>>>>>>>> Change Log was also modified:
>>>>>>>>>>>
>>>>>>>>>>> ChangeLog
>>>>>>>>>>>
>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary): Avoid type 
>>>>>>>>>>> sinking
>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A ^ 
>>>>>>>>>>> B) & C
>>>>>>>>>>>         for short integer types.
>>>>>>>>>>>         * gcc.dg/pr56175.c: New test.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2013/2/20 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Wed, Feb 20, 2013 at 1:00 PM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patch is aimed to recognize (A & C) ^ (B & C) -> (A ^ B) & C
>>>>>>>>>>>>> pattern in simpify_bitwise_binary for short integer types.
>>>>>>>>>>>>> The fix is very simple - we simply turn off short type sinking at 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> first pass of forward propagation allows to get
>>>>>>>>>>>>> +10% speedup for important benchmark Coremark 1.0 at x86 Atom and
>>>>>>>>>>>>> +5-7% for other x86 platforms too.
>>>>>>>>>>>>> Bootstrapping and regression testing were successful on x86-64.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it Ok for trunk?
>>>>>>>>>>>>
>>>>>>>>>>>> It definitely needs a comment before the checks.
>>>>>>>>>>>>
>>>>>>>>>>>> Also I think it simply shows that the code is placed at the wrong 
>>>>>>>>>>>> spot.
>>>>>>>>>>>> Simply moving it down in simplify_bitwise_binary to be done the 
>>>>>>>>>>>> very last
>>>>>>>>>>>> should get both of the effects done.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you rework the patch according to that?
>>>>>>>>>>>>
>>>>>>>>>>>> You also miss a testcase, we should make sure to not regress again 
>>>>>>>>>>>> here.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>> ChangeLog.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2013-02-20  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>>         PR tree-optimization/56175
>>>>>>>>>>>>>         * tree-ssa-forwprop.c (simplify_bitwise_binary) : Avoid 
>>>>>>>>>>>>> type sinking
>>>>>>>>>>>>>         at 1st forwprop pass to recognize (A & C) ^ (B & C) -> (A 
>>>>>>>>>>>>> ^ B) & C
>>>>>>>>>>>>>         for short integer types.

Re: Fix for 56175

Reply via email to