On Mon, 18 Jun 2012, William J. Schmidt wrote:
> On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
> > On Fri, 8 Jun 2012, William J. Schmidt wrote:
> >
> <snip>
> >
> > Hmm. I don't like this patch or its general idea too much. Instead
> > I'd like us to move more of the cost model detail to the target, giving
> > it a chance to look at the whole loop before deciding on a cost. ISTR
> > posting the overall idea at some point, but let me repeat it here instead
> > of trying to find that e-mail.
> >
> > The basic interface of the cost model should be, in targetm.vectorize
> >
> > /* Tell the target to start cost analysis of a loop or a basic-block
> > (if the loop argument is NULL). Returns an opaque pointer to
> > target-private data. */
> > void *init_cost (struct loop *loop);
> >
> > /* Add cost for N vectorized-stmt-kind statements in vector_mode. */
> > void add_stmt_cost (void *data, unsigned n,
> > vectorized-stmt-kind,
> > enum machine_mode vector_mode);
> >
> > /* Tell the target to compute and return the cost of the accumulated
> > statements and free any target-private data. */
> > unsigned finish_cost (void *data);
> >
> > with eventually slightly different signatures for add_stmt_cost
> > (like pass in the original scalar stmt?).
> >
> > It allows the target, at finish_cost time, to evaluate things like
> > register pressure and resource utilization.
> >
> > Thanks,
> > Richard.
>
> I've been looking at this in between other projects. I wanted to be
> sure I understood the SLP infrastructure and whether it would cause any
> problems. It looks to me like it will be mostly ok. One issue I
> noticed is a possible difference in the order in which SLP instructions
> are analyzed and the order in which the instructions are "issued" during
> transformation.
>
> For both loop analysis and basic block analysis, SLP trees are
> constructed and analyzed prior to examining other vectorizable
> instructions. Their costs are calculated and stored in the SLP trees at
> this time. Later, when transforming statements to their vector
> equivalents, instructions in the block (or loop body) are processed in
> order until the first instruction that's part of an SLP tree is
> encountered. At that point, every instruction that's part of any SLP
> tree is transformed; then the vectorizer continues with the remaining
> non-SLP vectorizable statements.
>
> So if we do the natural and easy thing of placing calls to add_stmt_cost
> everywhere that costs are calculated today, the order that those costs
> are presented to the back end model will possibly be different than the
> order they are actually "emitted."
Interesting. But I suppose this is similar to how pattern statements
are handled? Thus, the whole pattern sequence is processed when
we encounter the "main" pattern statement?
> For a first cut at this, I suggest ignoring the problem other than to
> document it as an opportunity for improvement. Later we could improve
> it by using an add_stmt_slp_cost () interface (or adding an is_slp
> flag), and another interface to be called at the time during analysis
> when the SLP statements will be issued during transformation. This
> would allow the back end model to queue up the SLP costs in a separate
> vector and later place them in its internal structures at the
> appropriate place.
>
> It should eventually be possible to remove these fields/accessors:
>
> * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
> * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
> * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
>
> However, I think this should be delayed until we have the basic
> infrastructure in place for the new model and well-tested.
Indeed.
> The other issue is that we should have the model track both the inside
> and outside costs if we're going to get everything into the target
> model. For a first pass we can ignore this and keep the existing logic
> for the outside costs. Later we should add some interfaces analogous to
> add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
> the model can track this stuff as carefully as it wants to.
Outside costs are merely added to the niter * inner-cost metric to
be compared with the scalar cost niter * scalar-cost, right? Thus
they would be tracked completely separate - eventually similar to
how we compute the cost of the scalar loop.
> So, I'd propose going at this in several phases:
>
> (1) Add calls to the new interface without disturbing existing logic;
> modify the profitability algorithms to query the new model for inside
> costs. Default algorithm for the model is to just sum costs as is done
> today.
Right.
> (x) Add heuristics to target models as desired.
> (2) Handle the SLP ordering problem.
> (3) Handle outside costs in the target model.
> (4) Remove the now unnecessary cost fields and the calls that set them.
>
> Item (x) can happen anytime after item (1).
>
> I don't think this work is terribly difficult, just a bit tedious. The
> only really time-consuming aspect of it will be in very careful testing
> to keep from changing existing behavior.
>
> All comments welcome -- please let me know what you think.
I think this is a reasonable plan.
Thanks,
Richard.