On Fri, 1 Dec 2017, Bin.Cheng wrote: > On Fri, Dec 1, 2017 at 12:31 PM, Richard Biener <rguent...@suse.de> wrote: > > > > This is the access stride computation change. Apart from the > > stride extraction I adjusted the cost model to handle non-constant > > strides by checking if either is a multiple of the other and > > simply fail interchanging if it's the wrong way around for one > > ref or if the simple method using multiple_of_p fails to determine > > either case. > > > > This still handles the bwaves case. > > > > I think we want additional testcases with variable strides for each > > case we add - I believe this is the most conservative way to treat > > variable strides. > > > > It may be inconsistent with the constant stride handling where you > > allow for many OK DRs to outweight a few not OK DRs, but as it > > worked for bwaves it must be good enough ;) > > > > Tested on x86_64-unknown-linux-gnu (just the interchange testcases). > > > > Currently running a bootstrap with -O3 -g -floop-interchange. > > > > Ok for the branch? > Ok. This actually is closer the motivation: simple/conservative cost > model that only transforms code when it's known to be good. > I will check the impact on the number of interchange in spec.
Few high-level observations. In tree_loop_interchange::interchange we try interchanging adjacent loops, starting from innermost with outer of innermost. This way the innermost loop will bubble up as much as possible. But we don't seem to handle bulling multiple loops like for for (i=0; i<n; ++i) for (j=0; j<n; ++j) for (k=0; k<n ++k) a[j][k][i] = 1.; because the innermost two loops are correctly ordered so we then try interchanging the k and the i loop which succeeds but then we stop. So there's something wrong in the iteration scheme. I would have expected it to be quadratic, basically iterating the ::interchange loop until we didn't perform any interchange (or doing sth more clever). loop_cand::can_interchange_p seems to perform per BB checks (supported_operations, num_stmts) that with the way we interchange should disallow any such BB in a loop that we interchange or interchange across. That means it looks like sth we should perform early, like during data dependence gathering by for example inlining find_data_references_in_bb and doing those per-stmt checks there? In prepare_perfect_loop_nest we seem to be somewhat quadratic in the way we re-compute dependences if doing so failed (we also always just strip the outermost loop while the failing DDR could involve a DR that is in an inner loop). I think it should be possible to re-structure this computing dependences from inner loop body to outer loop bodies (the ddrs vector is, as opposed to the dr vector, unsorted I think). I haven't fully thought this out yet though - a similar iteration scheme could improve DR gathering though that's not so costly. Overall we should try improving on function names, we have valid_data_dependences, can_interchange_loops, should_interchange_loops, can_interchange_p which all are related but do slightly different things. My usual approach is to inline all single-use functions to improve things (and make program flow more visible). But I guess that's too much implementation detail. Didn't get to the IV re-mapping stuff yet but you (of course) end up with quite some dead IVs when iterating the interchange. You seem to add a new canonical IV just to avoid rewriting the existing exit test, right? Defering that to the "final" interchange on a nest should avoid those dead IVs. Will now leave for the weekend. Thanks, Richard.