Hi Richard and Bin, Thanks for your comments. I will revise the patch and post it as soon as stage-1 opens.
Kugan On 7 March 2018 at 22:25, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah >>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>> Hi Richard, >>>>> >>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah >>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> On 31 January 2018 at 21:39, Richard Biener >>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>>> Hi Richard, >>>>>>>>> >>>>>>>>> Thanks for the review. >>>>>>>>> On 25 January 2018 at 20:04, Richard Biener >>>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>>>>>>> would like to queue this for review for next stage 1. >>>>>>>>>>> >>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>>>>>>> above. >>>>>>>>>>> 2. This does not distribute loop to detect popcount (like >>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please >>>>>>>>>>> correct >>>>>>>>>>> me if I am wrong. >>>>>>>>>> >>>>>>>>>> But then it has no business inside loop distribution but instead is >>>>>>>>>> doing final value >>>>>>>>>> replacement, right? You are pattern-matching the whole loop after >>>>>>>>>> all. I think >>>>>>>>>> final value replacement would already do the correct thing if you >>>>>>>>>> teached number of >>>>>>>>>> iteration analysis that niter for >>>>>>>>>> >>>>>>>>>> <bb 3> [local count: 955630224]: >>>>>>>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>>>>>>> _1 = b_11 + -1; >>>>>>>>>> b_8 = _1 & b_11; >>>>>>>>>> if (b_8 != 0) >>>>>>>>>> goto <bb 6>; [89.00%] >>>>>>>>>> else >>>>>>>>>> goto <bb 8>; [11.00%] >>>>>>>>>> >>>>>>>>>> <bb 6> [local count: 850510900]: >>>>>>>>>> goto <bb 3>; [100.00%] >>>>>>>>> >>>>>>>>> I am looking into this approach. What should be the scalar evolution >>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>>>>>>> and can this be represented with the scev? >>>>>>>> >>>>>>>> No, it's not affine and thus cannot be represented. You only need the >>>>>>>> scalar evolution of the counting IV which is already handled and >>>>>>>> the number of iteration analysis needs to handle the above IV - this >>>>>>>> is the missing part. >>>>>>> Thanks for the clarification. I am now matching this loop pattern in >>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions >>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in >>>>>>> the loop preheater and setting the loop niter with this. This will be >>>>>>> used by the final value replacement. Is this what you wanted? >>>>>> >>>>>> No, you shouldn't insert a popcount stmt but instead the niter >>>>>> GENERIC tree should be a CALL_EXPR to popcount with the >>>>>> appropriate argument. >>>>> >>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if >>>>> niter in tree_niter_desc can take such. >>>>> >>>>> Attached patch now does this. Also had to add support for CALL_EXPR in >>>>> few places to handle niter with CALL_EXPR. Does this look OK? >>>> >>>> Overall this looks ok - the patch includes changes in places that I don't >>>> think >>>> need changes such as chrec_convert_1 or extract_ops_from_tree. >>>> The expression_expensive_p change should be more specific than making >>>> all calls inexpensive as well. >>>> >>>> The verify_ssa change looks bogus, you do >>>> >>>> + dest = gimple_phi_result (count_phi); >>>> + tree var = make_ssa_name (TREE_TYPE (dest), NULL); >>>> + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); >>>> + >>>> + var = build_call_expr (fn, 1, src); >>>> + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, >>>> + build_int_cst (TREE_TYPE (dest), 1)); >>>> >>>> why do you allocate a new SSA name here? It seems unused >>>> as you overwrive 'var' with the CALL_EXPR immediately. >>>> >>>> I didn't review the pattern matching thoroughly nor the exact place you >>>> call it. But >>>> >>>> + if (check_popcount_pattern (loop, &count)) >>>> + { >>>> + niter->assumptions = boolean_false_node; >>>> + niter->control.base = NULL_TREE; >>>> + niter->control.step = NULL_TREE; >>>> + niter->control.no_overflow = false; >>>> + niter->niter = count; >>>> + niter->assumptions = boolean_true_node; >>>> + niter->may_be_zero = boolean_false_node; >>>> + niter->max = -1; >>>> + niter->bound = NULL_TREE; >>>> + niter->cmp = ERROR_MARK; >>>> + return true; >>>> + } >>>> >>>> simply setting may_be_zero to false looks fishy. Try >>>> with -fno-tree-loop-ch. Also max should not be negative, >>>> it should be the number of bits in the IV type? >>>> >>>> A related testcase could be that we can completely peel >>>> a loop like the following which iterates at most 8 times: >>>> >>>> int a[8]; >>>> void foo (unsigned char ctrl) >>>> { >>>> int c = 0; >>>> while (ctrl) >>>> { >>>> ctrl = ctrl & (ctrl - 1); >>>> a[c++] = ctrl; >>>> } >>>> } >>>> >>>> This is now stage1 material so please update and re-post. Maybe Bin has >>>> further suggestions as well. >>> Sorry for being late on this. Actually I thought about popcount in >>> distribution before. More like the first patch, but handled as >>> another distribution pattern rather than a special case. It's a bit >>> strange to compute and store the info in niters. It's also not >>> straight forward when/where the transformation is finally done. >> >> It's done at final value replacement if the counter is used. But with >> it in place we should also be able to compute niters for the case >> where it isn't (like the above case). So I believe that in the end handling >> this in niter analysis is more powerful. > Yes, you are right, as distribution pattern, we would only be able to > handle standalone popcount loop. > > Thanks, > bin >> >>> I haven't looked into the details so not sure how appropriate it will >>> be as a distribution pattern (current ones are only about data >>> references). So I am okay with this version if it's more appropriate. >> >> Yeah, adding distribution patterns for scalar reduction forms is certainly >> appropriate but then this is the same or at least similar as final >> value replacement. >> >> Richard. >> >>> Thanks, >>> bin >>>> >>>> Thanks, >>>> Richard. >>>> >>>>> Thanks, >>>>> Kugan >>>>> >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >>>>> >>>>> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR. >>>>> * tree-chrec.c (chrec_convert_1): Likewise. >>>>> * tree-scalar-evolution.c (expression_expensive_p): Likewise. >>>>> * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise. >>>>> * tree-ssa-loop-niter.c (check_popcount_pattern): New. >>>>> (number_of_iterations_exit): Record niter for popcount patern. >>>>> * tree-ssa.c (verify_ssa): Check stmt to be non NULL. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >>>>> >>>>> * gcc.dg/tree-ssa/popcount.c: New test. >>>>> >>>>> >>>>>> >>>>>>> In general, there is also a condition gating the loop like >>>>>>> >>>>>>> if (b_4 != 0) >>>>>>> goto loop; >>>>>>> else >>>>>>> end: >>>>>>> >>>>>>> This of course will not be removed by final value replacement. Since >>>>>>> popcount (0) is defined, this is redundant and could be removed but >>>>>>> not removed. >>>>>> >>>>>> Yeah, that's probably sth for another pass though. I suppose the >>>>>> end: case just uses zero in which case you'll have a PHI and you >>>>>> can optimize >>>>>> >>>>>> if (b != 0) >>>>>> return popcount (b); >>>>>> return 0; >>>>>> >>>>>> in phiopt. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Thanks, >>>>>>> Kugan >>>>>>> >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Kugan >>>>>>>>>> >>>>>>>>>> is related to popcount (b_5). >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>>>>>>>>>> regressions. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Kugan >>>>>>>>>>> >>>>>>>>>>> gcc/ChangeLog: >>>>>>>>>>> >>>>>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>>>>> >>>>>>>>>>> PR middle-end/82479 >>>>>>>>>>> * tree-loop-distribution.c (handle_popcount): New. >>>>>>>>>>> (pass_loop_distribution::execute): Use handle_popcount. >>>>>>>>>>> >>>>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>>>> >>>>>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>>>>> >>>>>>>>>>> PR middle-end/82479 >>>>>>>>>>> * gcc.dg/tree-ssa/popcount.c: New test.