On Fri, 1 Dec 2017, Bin.Cheng wrote:

> On Fri, Dec 1, 2017 at 12:31 PM, Richard Biener <rguent...@suse.de> wrote:
> >
> > This is the access stride computation change.  Apart from the
> > stride extraction I adjusted the cost model to handle non-constant
> > strides by checking if either is a multiple of the other and
> > simply fail interchanging if it's the wrong way around for one
> > ref or if the simple method using multiple_of_p fails to determine
> > either case.
> >
> > This still handles the bwaves case.
> >
> > I think we want additional testcases with variable strides for each
> > case we add - I believe this is the most conservative way to treat
> > variable strides.
> >
> > It may be inconsistent with the constant stride handling where you
> > allow for many OK DRs to outweight a few not OK DRs, but as it
> > worked for bwaves it must be good enough ;)
> >
> > Tested on x86_64-unknown-linux-gnu (just the interchange testcases).
> >
> > Currently running a bootstrap with -O3 -g -floop-interchange.
> >
> > Ok for the branch?
> Ok.  This actually is closer the motivation: simple/conservative cost
> model that only transforms code when it's known to be good.
> I will check the impact on the number of interchange in spec.

Few high-level observations.

In tree_loop_interchange::interchange we try interchanging adjacent
loops, starting from innermost with outer of innermost.  This way
the innermost loop will bubble up as much as possible.  But we
don't seem to handle bulling multiple loops like for

 for (i=0; i<n; ++i)
  for (j=0; j<n; ++j)
    for (k=0; k<n ++k)
      a[j][k][i] = 1.;

because the innermost two loops are correctly ordered so we then
try interchanging the k and the i loop which succeeds but then
we stop.  So there's something wrong in the iteration scheme.
I would have expected it to be quadratic, basically iterating
the ::interchange loop until we didn't perform any interchange
(or doing sth more clever).

loop_cand::can_interchange_p seems to perform per BB checks
(supported_operations, num_stmts) that with the way we interchange
should disallow any such BB in a loop that we interchange or
interchange across.  That means it looks like sth we should
perform early, like during data dependence gathering by for
example inlining find_data_references_in_bb and doing those
per-stmt checks there?

In prepare_perfect_loop_nest we seem to be somewhat quadratic
in the way we re-compute dependences if doing so failed
(we also always just strip the outermost loop while the failing
DDR could involve a DR that is in an inner loop).  I think it
should be possible to re-structure this computing dependences
from inner loop body to outer loop bodies (the ddrs vector
is, as opposed to the dr vector, unsorted I think).
I haven't fully thought this out yet though - a similar
iteration scheme could improve DR gathering though that's not
so costly.

Overall we should try improving on function names, we have
valid_data_dependences, can_interchange_loops, should_interchange_loops,
can_interchange_p which all are related but do slightly different
things.  My usual approach is to inline all single-use functions
to improve things (and make program flow more visible).  But
I guess that's too much implementation detail.

Didn't get to the IV re-mapping stuff yet but you (of course)
end up with quite some dead IVs when iterating the interchange.
You seem to add a new canonical IV just to avoid rewriting the
existing exit test, right?  Defering that to the "final"
interchange on a nest should avoid those dead IVs.

Will now leave for the weekend.

Thanks,
Richard.

Reply via email to