On Fri, 5 Feb 2021, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > The following attempts to account for the fact that BB vectorization > > regions now can span multiple loop levels and that an unprofitable > > inner loop vectorization shouldn't be offsetted by a profitable > > outer loop vectorization to make it overall profitable. > > > > For now I've implemented a heuristic based on the premise that > > vectorization should be profitable even if loops may not be entered > > or if they iterate any number of times. Especially the first > > assumption then requires that stmts directly belonging to loop A > > need to be costed separately from stmts belonging to another loop > > which also simplifies the implementation. > > > > On x86 the added testcase has in the outer loop > > > > t.c:38:20: note: Cost model analysis for part in loop 1: > > Vector cost: 56 > > Scalar cost: 192 > > > > and the inner loop > > > > t.c:38:20: note: Cost model analysis for part in loop 2: > > Vector cost: 132 > > Scalar cost: 48 > > > > and thus the vectorization is considered not profitable > > (note the same would happen in case the 2nd cost were for > > a loop outer to the 1st costing). > > > > Future enhancements may consider static knowledge of whether > > a loop is always entered which would allow some inefficiency > > in the vectorization of its loop header. Likewise stmts only > > reachable from a loop exit can be treated this way. > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu and it fixes > > the regression reported in the PR. > > > > Does this look sensible and good enough for GCC 11? > > I don't know the new SLP code very well, but FWIW it seems reasonable > to me. > > > +/* Comparator for the loop-index sorted cost vectors. */ > > + > > +static int > > +li_cost_vec_cmp (const void *a_, const void *b_) > > +{ > > + auto *a = (const std::pair<unsigned, stmt_info_for_cost *> *)a_; > > + auto *b = (const std::pair<unsigned, stmt_info_for_cost *> *)b_; > > + if (a->first < b->first) > > + return -1; > > + else if (a->first == b->first) > > + return 0; > > + return 1; > > +} > > This isn't a stable sort. (Or isn't that an issue these days?) > > > + /* First produce cost vectors sorted by loop index. */ > > + auto_vec<std::pair<unsigned, stmt_info_for_cost *> > > > + li_scalar_costs (scalar_costs.length ()); > > + auto_vec<std::pair<unsigned, stmt_info_for_cost *> > > > + li_vector_costs (vector_costs.length ()); > > + FOR_EACH_VEC_ELT (scalar_costs, i, cost) > > + { > > + unsigned l = gimple_bb (cost->stmt_info->stmt)->loop_father->num; > > + li_scalar_costs.quick_push (std::make_pair (l, cost)); > > + } > > + unsigned l = li_scalar_costs[0].first; > > Is this just to silence an unused warning? Might be worth a comment if so > (although couldn't we just use 0?).
The issue is that not all vector_costs entries have a stmt_info so this uses a random loop also used in the scalar code (that's going to be the correct loop in case there's only one loop involved which is probably 99% of the cases). I'mm add a comment. > > + FOR_EACH_VEC_ELT (vector_costs, i, cost) > > + { > > + /* We inherit from the previous SI, invariants, externals and > > + extracts immediately follow the cost for the related stmt. */ > > Looks like the comment predates s/SI/COST/. Indeed, will fix and push. Thanks, Richard.