Hi Richard, On 1 February 2018 at 23:21, Richard Biener <richard.guent...@gmail.com> wrote: > On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah > <kugan.vivekanandara...@linaro.org> wrote: >> Hi Richard, >> >> On 31 January 2018 at 21:39, Richard Biener <richard.guent...@gmail.com> >> wrote: >>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah >>> <kugan.vivekanandara...@linaro.org> wrote: >>>> Hi Richard, >>>> >>>> Thanks for the review. >>>> On 25 January 2018 at 20:04, Richard Biener <richard.guent...@gmail.com> >>>> wrote: >>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah >>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>> Hi All, >>>>>> >>>>>> Here is a patch for popcount builtin detection similar to LLVM. I >>>>>> would like to queue this for review for next stage 1. >>>>>> >>>>>> 1. This is done part of loop-distribution and effective for -O3 and >>>>>> above. >>>>>> 2. This does not distribute loop to detect popcount (like >>>>>> memcpy/memmove). I dont think that happens in practice. Please correct >>>>>> me if I am wrong. >>>>> >>>>> But then it has no business inside loop distribution but instead is >>>>> doing final value >>>>> replacement, right? You are pattern-matching the whole loop after all. >>>>> I think >>>>> final value replacement would already do the correct thing if you >>>>> teached number of >>>>> iteration analysis that niter for >>>>> >>>>> <bb 3> [local count: 955630224]: >>>>> # b_11 = PHI <b_5(5), b_8(6)> >>>>> _1 = b_11 + -1; >>>>> b_8 = _1 & b_11; >>>>> if (b_8 != 0) >>>>> goto <bb 6>; [89.00%] >>>>> else >>>>> goto <bb 8>; [11.00%] >>>>> >>>>> <bb 6> [local count: 850510900]: >>>>> goto <bb 3>; [100.00%] >>>> >>>> I am looking into this approach. What should be the scalar evolution >>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me >>>> and can this be represented with the scev? >>> >>> No, it's not affine and thus cannot be represented. You only need the >>> scalar evolution of the counting IV which is already handled and >>> the number of iteration analysis needs to handle the above IV - this >>> is the missing part. >> Thanks for the clarification. I am now matching this loop pattern in >> number_of_iterations_exit when number_of_iterations_exit_assumptions >> fails. If the pattern matches, I am inserting the _builtin_popcount in >> the loop preheater and setting the loop niter with this. This will be >> used by the final value replacement. Is this what you wanted? > > No, you shouldn't insert a popcount stmt but instead the niter > GENERIC tree should be a CALL_EXPR to popcount with the > appropriate argument.
Thats what I tried earlier but ran into some ICEs. I wasn't sure if niter in tree_niter_desc can take such. Attached patch now does this. Also had to add support for CALL_EXPR in few places to handle niter with CALL_EXPR. Does this look OK? Thanks, Kugan gcc/ChangeLog: 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR. * tree-chrec.c (chrec_convert_1): Likewise. * tree-scalar-evolution.c (expression_expensive_p): Likewise. * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise. * tree-ssa-loop-niter.c (check_popcount_pattern): New. (number_of_iterations_exit): Record niter for popcount patern. * tree-ssa.c (verify_ssa): Check stmt to be non NULL. gcc/testsuite/ChangeLog: 2018-02-08 Kugan Vivekanandarajah <kug...@linaro.org> * gcc.dg/tree-ssa/popcount.c: New test. > >> In general, there is also a condition gating the loop like >> >> if (b_4 != 0) >> goto loop; >> else >> end: >> >> This of course will not be removed by final value replacement. Since >> popcount (0) is defined, this is redundant and could be removed but >> not removed. > > Yeah, that's probably sth for another pass though. I suppose the > end: case just uses zero in which case you'll have a PHI and you > can optimize > > if (b != 0) > return popcount (b); > return 0; > > in phiopt. > > Richard. > >> Thanks, >> Kugan >> >>> >>> Richard. >>> >>>> Thanks, >>>> Kugan >>>>> >>>>> is related to popcount (b_5). >>>>> >>>>> Richard. >>>>> >>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new >>>>>> regressions. >>>>>> >>>>>> Thanks, >>>>>> Kugan >>>>>> >>>>>> gcc/ChangeLog: >>>>>> >>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>> >>>>>> PR middle-end/82479 >>>>>> * tree-loop-distribution.c (handle_popcount): New. >>>>>> (pass_loop_distribution::execute): Use handle_popcount. >>>>>> >>>>>> gcc/testsuite/ChangeLog: >>>>>> >>>>>> 2018-01-25 Kugan Vivekanandarajah <kug...@linaro.org> >>>>>> >>>>>> PR middle-end/82479 >>>>>> * gcc.dg/tree-ssa/popcount.c: New test.
From 296efa2c09cdd797e3295f0c29a5a943dc87e9f2 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> Date: Thu, 8 Feb 2018 09:28:57 +1100 Subject: [PATCH] popcount v2 --- gcc/gimple-expr.c | 3 +- gcc/testsuite/gcc.dg/tree-ssa/popcount.c | 41 ++++++++++ gcc/tree-chrec.c | 2 + gcc/tree-scalar-evolution.c | 2 + gcc/tree-ssa-loop-ivopts.c | 10 +++ gcc/tree-ssa-loop-niter.c | 124 ++++++++++++++++++++++++++++++- gcc/tree-ssa.c | 2 +- 7 files changed, 181 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c index 56caaca..3fc04ff 100644 --- a/gcc/gimple-expr.c +++ b/gcc/gimple-expr.c @@ -548,7 +548,8 @@ extract_ops_from_tree (tree expr, enum tree_code *subcode_p, tree *op1_p, *op2_p = NULL_TREE; *op3_p = NULL_TREE; } - else if (grhs_class == GIMPLE_SINGLE_RHS) + else if (grhs_class == GIMPLE_SINGLE_RHS + || TREE_CODE (expr) == CALL_EXPR) { *op1_p = expr; *op2_p = NULL_TREE; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c new file mode 100644 index 0000000..86a66cb --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +extern int foo (int); + +int PopCount (long b) { + int c = 0; + b++; + + while (b) { + b &= b - 1; + c++; + } + return c; +} +int PopCount2 (long b) { + int c = 0; + + while (b) { + b &= b - 1; + c++; + } + foo (c); + return foo (c); +} + +void PopCount3 (long b1) { + + for (long i = 0; i < b1; ++i) + { + long b = i; + int c = 0; + while (b) { + b &= b - 1; + c++; + } + foo (c); + } +} + +/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */ diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c index 924df04..098f46f 100644 --- a/gcc/tree-chrec.c +++ b/gcc/tree-chrec.c @@ -1382,6 +1382,8 @@ keep_cast: CHREC_RIGHT (chrec))); res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics, from); } + else if (TREE_CODE (chrec) == CALL_EXPR) + res = fold_build1 (NOP_EXPR, TREE_TYPE (TREE_TYPE (chrec)), chrec); else res = fold_convert (type, chrec); diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c index 0ba1aa8..ff313c9 100644 --- a/gcc/tree-scalar-evolution.c +++ b/gcc/tree-scalar-evolution.c @@ -3488,6 +3488,8 @@ expression_expensive_p (tree expr) if (!integer_pow2p (TREE_OPERAND (expr, 1))) return true; } + if (code == CALL_EXPR) + return false; switch (TREE_CODE_CLASS (code)) { diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index b313571..519649a 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr) code = TREE_CODE (expr); codeclass = TREE_CODE_CLASS (code); + if (code == CALL_EXPR) + { + tree arg; + call_expr_arg_iterator iter; + FOR_EACH_CALL_EXPR_ARG (arg, iter, expr) + if (contains_abnormal_ssa_name_p (arg)) + return true; + return false; + } + if (code == SSA_NAME) return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0; diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c index fa49abf..470ba2f 100644 --- a/gcc/tree-ssa-loop-niter.c +++ b/gcc/tree-ssa-loop-niter.c @@ -2430,6 +2430,111 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit, return (!integer_zerop (niter->assumptions)); } +/* See if LOOP is a popcout implementation of the form + + int c = 0; + while (b) { + b = b & (b - 1); + c++; + } + + If so, Set NITER to __builtin_popcount (b) - 1 + return true if we did, false otherwise. */ + +static bool +check_popcount_pattern (loop_p loop, tree *niter) +{ + tree lhs, rhs; + tree dest, src; + gimple *and_minus_one; + int count = 0; + gphi *count_phi = NULL; + + if (!single_exit (loop)) + return false; + + /* Check loop terminating branch is like + if (b != 0). */ + gimple *stmt = last_stmt (loop->header); + if (!stmt + || gimple_code (stmt) != GIMPLE_COND + || !zerop (gimple_cond_rhs (stmt))) + return false; + + /* Cheeck "b = b & (b - 1)" is calculated. */ + lhs = gimple_cond_lhs (stmt); + if (TREE_CODE (lhs) != SSA_NAME) + return false; + gimple *and_stmt = SSA_NAME_DEF_STMT (lhs); + if (!is_gimple_assign (and_stmt) + || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR) + return false; + lhs = gimple_assign_rhs1 (and_stmt); + rhs = gimple_assign_rhs2 (and_stmt); + if (TREE_CODE (lhs) == SSA_NAME + && (and_minus_one = SSA_NAME_DEF_STMT (lhs)) + && is_gimple_assign (and_minus_one) + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) + lhs = rhs; + else if (TREE_CODE (rhs) == SSA_NAME + && (and_minus_one = SSA_NAME_DEF_STMT (rhs)) + && is_gimple_assign (and_minus_one) + && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR) + && integer_minus_onep (gimple_assign_rhs2 (and_minus_one))) + ; + else + return false; + if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one)) + && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one))) + return false; + + /* Check the recurrence. */ + gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one)); + gimple *src_phi = SSA_NAME_DEF_STMT (lhs); + if (gimple_code (phi) != GIMPLE_PHI + || gimple_code (src_phi) != GIMPLE_PHI) + return false; + + /* Check the loop closed SSA definition for just the variable c defined in + loop. */ + src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx); + basic_block bb = single_exit (loop)->dest; + for (gphi_iterator gpi = gsi_start_phis (bb); + !gsi_end_p (gpi); gsi_next (&gpi)) + { + count_phi = gpi.phi (); + count++; + } + + if (count != 1) + return false; + + /* Make sure we have a count by one and it starts from zero. */ + rhs = gimple_phi_arg_def (count_phi, 0); + stmt = SSA_NAME_DEF_STMT (rhs); + if (!is_gimple_assign (stmt) + || (gimple_assign_rhs_code (stmt) != PLUS_EXPR) + || !integer_onep (gimple_assign_rhs2 (stmt))) + return false; + rhs = gimple_assign_rhs1 (stmt); + stmt = SSA_NAME_DEF_STMT (rhs); + if (gimple_code (stmt) != GIMPLE_PHI + || !integer_zerop (gimple_phi_arg_def (stmt, + loop_preheader_edge (loop)->dest_idx))) + return false; + + dest = gimple_phi_result (count_phi); + tree var = make_ssa_name (TREE_TYPE (dest), NULL); + tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT); + + var = build_call_expr (fn, 1, src); + *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var, + build_int_cst (TREE_TYPE (dest), 1)); + return true; +} + + /* Like number_of_iterations_exit_assumptions, but return TRUE only if the niter information holds unconditionally. */ @@ -2441,7 +2546,24 @@ number_of_iterations_exit (struct loop *loop, edge exit, gcond *stmt; if (!number_of_iterations_exit_assumptions (loop, exit, niter, &stmt, every_iteration)) - return false; + { + tree count; + if (check_popcount_pattern (loop, &count)) + { + niter->assumptions = boolean_false_node; + niter->control.base = NULL_TREE; + niter->control.step = NULL_TREE; + niter->control.no_overflow = false; + niter->niter = count; + niter->assumptions = boolean_true_node; + niter->may_be_zero = boolean_false_node; + niter->max = -1; + niter->bound = NULL_TREE; + niter->cmp = ERROR_MARK; + return true; + } + return false; + } if (integer_nonzerop (niter->assumptions)) return true; diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c index ee311ce..572419c 100644 --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -1047,7 +1047,7 @@ verify_ssa (bool check_modified_stmt, bool check_ssa_operands) verify_ssa_name (name, virtual_operand_p (name)); stmt = SSA_NAME_DEF_STMT (name); - if (!gimple_nop_p (stmt)) + if (stmt && !gimple_nop_p (stmt)) { basic_block bb = gimple_bb (stmt); if (verify_def (bb, definition_block, -- 2.7.4