On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: >> Hi Richard, >> >> On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> >> wrote: >>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah >>> <kugan.vivekanandara...@linaro.org> wrote: >>>> Hi Richard, >>>> >>>> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> >>>> wrote: >>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>> Hi Richard, >>>>>> >>>>>> Thanks for the review. >>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guent...@gmail.com> >>>>>> wrote: >>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>>>> would like to queue this for review for next stage 1. >>>>>>>> >>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>>>> above. >>>>>>>> 2. This does not distribute loop to detect popcount (like >>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct >>>>>>>> me if I am wrong. >>>>>>> >>>>>>> But then it has no business inside loop distribution but instead is >>>>>>> doing final value >>>>>>> replacement, right? You are pattern-matching the whole loop after all. >>>>>>> I think >>>>>>> final value replacement would already do the correct thing if you >>>>>>> teached number of >>>>>>> iteration analysis that niter for >>>>>>> >>>>>>> <bb 3> [local count: 955630224]: >>>>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>>>> _1 = b_11 + -1; >>>>>>> b_8 = _1 & b_11; >>>>>>> if (b_8 != 0) >>>>>>> goto <bb 6>; [89.00%] >>>>>>> else >>>>>>> goto <bb 8>; [11.00%] >>>>>>> >>>>>>> <bb 6> [local count: 850510900]: >>>>>>> goto <bb 3>; [100.00%] >>>>>> >>>>>> I am looking into this approach. What should be the scalar evolution >>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>>>> and can this be represented with the scev? >>>>> >>>>> No, it's not affine and thus cannot be represented. You only need the >>>>> scalar evolution of the counting IV which is already handled and >>>>> the number of iteration analysis needs to handle the above IV - this >>>>> is the missing part. >>>> Thanks for the clarification. I am now matching this loop pattern in >>>> number_of_iterations_exit when number_of_iterations_exit_assumptions >>>> fails. If the pattern matches, I am inserting the _builtin_popcount in >>>> the loop preheater and setting the loop niter with this. This will be >>>> used by the final value replacement. Is this what you wanted? >>> >>> No, you shouldn't insert a popcount stmt but instead the niter >>> GENERIC tree should be a CALL_EXPR to popcount with the >>> appropriate argument. >> >> Thats what I tried earlier but ran into some ICEs. I wasn't sure if >> niter in tree_niter_desc can take such. >> >> Attached patch now does this. Also had to add support for CALL_EXPR in >> few places to handle niter with CALL_EXPR. Does this look OK? > > Overall this looks ok - the patch includes changes in places that I don't > think > need changes such as chrec_convert_1 or extract_ops_from_tree. > The expression_expensive_p change should be more specific than making > all calls inexpensive as well. > > The verify_ssa change looks bogus, you do > > + dest = gimple_phi_result (count_phi); > + tree var = make_ssa_name (TREE_TYPE (dest), NULL); > + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); > + > + var = build_call_expr (fn, 1, src); > + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, > + build_int_cst (TREE_TYPE (dest), 1)); > > why do you allocate a new SSA name here? It seems unused > as you overwrive 'var' with the CALL_EXPR immediately. > > I didn't review the pattern matching thoroughly nor the exact place you > call it. But > > + if (check_popcount_pattern (loop, &count)) > + { > + niter->assumptions = boolean_false_node; > + niter->control.base = NULL_TREE; > + niter->control.step = NULL_TREE; > + niter->control.no_overflow = false; > + niter->niter = count; > + niter->assumptions = boolean_true_node; > + niter->may_be_zero = boolean_false_node; > + niter->max = -1; > + niter->bound = NULL_TREE; > + niter->cmp = ERROR_MARK; > + return true; > + } > > simply setting may_be_zero to false looks fishy. Try > with -fno-tree-loop-ch. Also max should not be negative, > it should be the number of bits in the IV type? > > A related testcase could be that we can completely peel > a loop like the following which iterates at most 8 times: > > int a[8]; > void foo (unsigned char ctrl) > { > int c = 0; > while (ctrl) > { > ctrl = ctrl & (ctrl - 1); > a[c++] = ctrl; > } > } > > This is now stage1 material so please update and re-post. Maybe Bin has > further suggestions as well. Sorry for being late on this. Actually I thought about popcount in distribution before. More like the first patch, but handled as another distribution pattern rather than a special case. It's a bit strange to compute and store the info in niters. It's also not straight forward when/where the transformation is finally done. I haven't looked into the details so not sure how appropriate it will be as a distribution pattern (current ones are only about data references). So I am okay with this version if it's more appropriate.
Thanks, bin > > Thanks, > Richard. > >> Thanks, >> Kugan >> >> >> gcc/ChangeLog: >> >> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >> >> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR. >> * tree-chrec.c (chrec_convert_1): Likewise. >> * tree-scalar-evolution.c (expression_expensive_p): Likewise. >> * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise. >> * tree-ssa-loop-niter.c (check_popcount_pattern): New. >> (number_of_iterations_exit): Record niter for popcount patern. >> * tree-ssa.c (verify_ssa): Check stmt to be non NULL. >> >> gcc/testsuite/ChangeLog: >> >> 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> >> >> * gcc.dg/tree-ssa/popcount.c: New test. >> >> >>> >>>> In general, there is also a condition gating the loop like >>>> >>>> if (b_4 != 0) >>>> goto loop; >>>> else >>>> end: >>>> >>>> This of course will not be removed by final value replacement. Since >>>> popcount (0) is defined, this is redundant and could be removed but >>>> not removed. >>> >>> Yeah, that's probably sth for another pass though. I suppose the >>> end: case just uses zero in which case you'll have a PHI and you >>> can optimize >>> >>> if (b != 0) >>> return popcount (b); >>> return 0; >>> >>> in phiopt. >>> >>> Richard. >>> >>>> Thanks, >>>> Kugan >>>> >>>>> >>>>> Richard. >>>>> >>>>>> Thanks, >>>>>> Kugan >>>>>>> >>>>>>> is related to popcount (b_5). >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>>>>>>> regressions. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Kugan >>>>>>>> >>>>>>>> gcc/ChangeLog: >>>>>>>> >>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>> >>>>>>>> PR middle-end/82479 >>>>>>>> * tree-loop-distribution.c (handle_popcount): New. >>>>>>>> (pass_loop_distribution::execute): Use handle_popcount. >>>>>>>> >>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>> >>>>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>>>> >>>>>>>> PR middle-end/82479 >>>>>>>> * gcc.dg/tree-ssa/popcount.c: New test.