On Thu, Nov 3, 2016 at 1:35 PM, Evgeny Kudryashov <kudryas...@ispras.ru> wrote:
> Hello,
>
> I'm facing the following problem related to ivopts. The problem is that GCC
> generates a lot of induction variables and as a result there is an
> unnecessary increase of stack usage and register pressure.
>
> For instance, for the attached testcase (tc_ivopts.c) GCC generates 26
> induction variables (25 for each of lhsX[{0-4}][{0-4}][k] and one for
> rhs[k][j][{0-2}]). You should be able to reproduce this issue on arm target
> using GCC with "-O2 -mcpu=cortex-a9 -mtune=cortex-a9". For reference, I'm
> attaching assembly I get on current trunk.
>
> The reason might be in use groups costs, in particular, in the way of
> estimation. Currently, the cost of a tuple (group, candidate) is a sum of
> per-use costs in the group. So, the cost of a group grows proportional to
> the number of uses in the group. This approach has a negative effect on the
> algorithm for finding the best set of induction variables: the part of a
> total cost related to adding a new candidate is almost washed out by the
> cost of the group. In addition, when there is a lot of groups with many uses
> inside and a target is out of free registers, the cost of spill is washing
> out too. As a result, GCC prefers to use native candidates (candidate
> created for a particular group) for each group of uses instead of
> considering the real cost of introducing a new variable into a set.
>
> The summing approach was added as a part of this patch
> (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00641.html) and the
> motivation for taking the sum does not seem to be
> specifically discussed.
>
> I propose the following patch that changes a group cost from cost summing to
> selecting the largest one inside a group. For the given test case I have:
> necessary size of stack is decreased by almost 3 times and ldr\str amount
> are decreased by less than 2 times. Also I'm attaching assembly after
> applying the patch.
>
> The essential change in the patch is just:
>
> diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
> index f9211ad..a149418 100644
> --- a/gcc/tree-ssa-loop-ivopts.c
> +++ b/gcc/tree-ssa-loop-ivopts.c
> @@ -5151,7 +5151,8 @@ determine_group_iv_cost_address (struct ivopts_data
> *data,
>          offset and where the iv_use is.  */
>         cost = get_computation_cost (data, next, cand, true,
>                                      NULL, &can_autoinc, NULL);
> -      sum_cost += cost;
> +      if (sum_cost < cost)
> +        sum_cost = cost;
>      }
>    set_group_iv_cost (data, group, cand, sum_cost, depends_on,
>                      NULL_TREE, ERROR_MARK, inv_expr);
>
> Any suggestions?
Hi Evgeny,

I tend to be practical here.  Given cost is not model that well in
IVOPT, any heuristic tuning in cost is quite likely resulting in
improvement in some cases, while regressions in others.  At least we
need to run good number of benchmarks for such changes.  Specific to
this one, the summary of cost in a group is a natural result of the
original code, in which uses' cost is added up before candidate
selection.  Take your code as an example, choosing loop's original
candidate for group results in lots of memory accesses with [base +
index << scale] addressing mode, which could have higher cost than
[base + offset] mode wrto u-arch, IMHO, it makes sense to add up this
cost.  OTOH, I totally agree that IVOPT tends to choose too many
candidates at the moment, especially for large loops.  Is it possible
to achieve the same effect by penalizing register pressure cost?
Meanwhile, I can collect benchmark data for your patch on AArch64 and
get back to you later.

Thanks,
bin
>
>
> Thanks,
> Evgeny.

Reply via email to