Hi Richard and Bin,

Thanks for your comments. I will revise the patch and post it as soon
as stage-1 opens.

Kugan

On 7 March 2018 at 22:25, Bin.Cheng <amker.ch...@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener
> <richard.guent...@gmail.com> wrote:
>> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.ch...@gmail.com> wrote:
>>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener
>>> <richard.guent...@gmail.com> wrote:
>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>> Hi Richard,
>>>>>
>>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> On 31 January 2018 at 21:39, Richard Biener 
>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> Thanks for the review.
>>>>>>>>> On 25 January 2018 at 20:04, Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
>>>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I
>>>>>>>>>>> would like to queue this for review for next stage 1.
>>>>>>>>>>>
>>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and 
>>>>>>>>>>> above.
>>>>>>>>>>> 2. This does not distribute loop to detect popcount (like
>>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please 
>>>>>>>>>>> correct
>>>>>>>>>>> me if I am wrong.
>>>>>>>>>>
>>>>>>>>>> But then it has no business inside loop distribution but instead is
>>>>>>>>>> doing final value
>>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after 
>>>>>>>>>> all.  I think
>>>>>>>>>> final value replacement would already do the correct thing if you
>>>>>>>>>> teached number of
>>>>>>>>>> iteration analysis that niter for
>>>>>>>>>>
>>>>>>>>>>   <bb 3> [local count: 955630224]:
>>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>
>>>>>>>>>>   _1 = b_11 + -1;
>>>>>>>>>>   b_8 = _1 & b_11;
>>>>>>>>>>   if (b_8 != 0)
>>>>>>>>>>     goto <bb 6>; [89.00%]
>>>>>>>>>>   else
>>>>>>>>>>     goto <bb 8>; [11.00%]
>>>>>>>>>>
>>>>>>>>>>   <bb 6> [local count: 850510900]:
>>>>>>>>>>   goto <bb 3>; [100.00%]
>>>>>>>>>
>>>>>>>>> I am looking into this approach. What should be the scalar evolution
>>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
>>>>>>>>> and can this be represented with the scev?
>>>>>>>>
>>>>>>>> No, it's not affine and thus cannot be represented.  You only need the
>>>>>>>> scalar evolution of the counting IV which is already handled and
>>>>>>>> the number of iteration analysis needs to handle the above IV - this
>>>>>>>> is the missing part.
>>>>>>> Thanks for the clarification. I am now matching this loop pattern in
>>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions
>>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in
>>>>>>> the loop preheater and setting the loop niter with this. This will be
>>>>>>> used by the final value replacement. Is this what you wanted?
>>>>>>
>>>>>> No, you shouldn't insert a popcount stmt but instead the niter
>>>>>> GENERIC tree should be a CALL_EXPR to popcount with the
>>>>>> appropriate argument.
>>>>>
>>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if
>>>>> niter in tree_niter_desc can take such.
>>>>>
>>>>> Attached patch now does this. Also had to add support for CALL_EXPR in
>>>>> few places to handle niter with CALL_EXPR. Does this look OK?
>>>>
>>>> Overall this looks ok - the patch includes changes in places that I don't 
>>>> think
>>>> need changes such as chrec_convert_1 or extract_ops_from_tree.
>>>> The expression_expensive_p change should be more specific than making
>>>> all calls inexpensive as well.
>>>>
>>>> The verify_ssa change looks bogus, you do
>>>>
>>>> +  dest = gimple_phi_result (count_phi);
>>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
>>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
>>>> +
>>>> +  var = build_call_expr (fn, 1, src);
>>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
>>>> +                       build_int_cst (TREE_TYPE (dest), 1));
>>>>
>>>> why do you allocate a new SSA name here?  It seems unused
>>>> as you overwrive 'var' with the CALL_EXPR immediately.
>>>>
>>>> I didn't review the pattern matching thoroughly nor the exact place you
>>>> call it.  But
>>>>
>>>> +      if (check_popcount_pattern (loop, &count))
>>>> +       {
>>>> +         niter->assumptions = boolean_false_node;
>>>> +         niter->control.base = NULL_TREE;
>>>> +         niter->control.step = NULL_TREE;
>>>> +         niter->control.no_overflow = false;
>>>> +         niter->niter = count;
>>>> +         niter->assumptions = boolean_true_node;
>>>> +         niter->may_be_zero = boolean_false_node;
>>>> +         niter->max = -1;
>>>> +         niter->bound = NULL_TREE;
>>>> +         niter->cmp = ERROR_MARK;
>>>> +         return true;
>>>> +       }
>>>>
>>>> simply setting may_be_zero to false looks fishy.  Try
>>>> with -fno-tree-loop-ch.  Also max should not be negative,
>>>> it should be the number of bits in the IV type?
>>>>
>>>> A related testcase could be that we can completely peel
>>>> a loop like the following which iterates at most 8 times:
>>>>
>>>> int a[8];
>>>> void foo (unsigned char ctrl)
>>>> {
>>>>   int c = 0;
>>>>   while (ctrl)
>>>>     {
>>>>        ctrl = ctrl & (ctrl - 1);
>>>>        a[c++] = ctrl;
>>>>     }
>>>> }
>>>>
>>>> This is now stage1 material so please update and re-post.  Maybe Bin has
>>>> further suggestions as well.
>>> Sorry for being late on this.  Actually I thought about popcount in
>>> distribution before.  More like the first patch, but handled as
>>> another distribution pattern rather than a special case.  It's a bit
>>> strange to compute and store the info in niters.  It's also not
>>> straight forward when/where the transformation is finally done.
>>
>> It's done at final value replacement if the counter is used.  But with
>> it in place we should also be able to compute niters for the case
>> where it isn't (like the above case).  So I believe that in the end handling
>> this in niter analysis is more powerful.
> Yes, you are right, as distribution pattern, we would only be able to
> handle standalone popcount loop.
>
> Thanks,
> bin
>>
>>> I haven't looked into the details so not sure how appropriate it will
>>> be as a distribution pattern (current ones are only about data
>>> references).  So I am okay with this version if it's more appropriate.
>>
>> Yeah, adding distribution patterns for scalar reduction forms is certainly
>> appropriate but then this is the same or at least similar as final
>> value replacement.
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Kugan
>>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>
>>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
>>>>>     * tree-chrec.c (chrec_convert_1): Likewise.
>>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.
>>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
>>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.
>>>>>     (number_of_iterations_exit): Record niter for popcount patern.
>>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 2018-02-08  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>
>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
>>>>>
>>>>>
>>>>>>
>>>>>>> In general, there is also a condition gating the loop like
>>>>>>>
>>>>>>> if (b_4 != 0)
>>>>>>>   goto loop;
>>>>>>> else
>>>>>>>   end:
>>>>>>>
>>>>>>> This of course will not be removed by final value replacement. Since
>>>>>>> popcount (0) is defined, this is redundant and could be removed but
>>>>>>> not removed.
>>>>>>
>>>>>> Yeah, that's probably sth for another pass though.  I suppose the
>>>>>> end: case just uses zero in which case you'll have a PHI and you
>>>>>> can optimize
>>>>>>
>>>>>>   if (b != 0)
>>>>>>     return popcount (b);
>>>>>>   return 0;
>>>>>>
>>>>>> in phiopt.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks,
>>>>>>> Kugan
>>>>>>>
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kugan
>>>>>>>>>>
>>>>>>>>>> is related to popcount (b_5).
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>>>>>>>>>>> regressions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kugan
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>>     PR middle-end/82479
>>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.
>>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.
>>>>>>>>>>>
>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kug...@linaro.org>
>>>>>>>>>>>
>>>>>>>>>>>     PR middle-end/82479
>>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

Reply via email to