On Tue, 20 Mar 2018, Bin.Cheng wrote: > On Mon, Mar 19, 2018 at 11:57 AM, Jakub Jelinek <ja...@redhat.com> wrote: > > On Fri, Mar 16, 2018 at 03:08:17PM +0100, Richard Biener wrote: > >> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok? > > > > Ok, thanks. > Hi Richard, > > This change causes below test failure on aarch64/arm linux: > FAIL: libgomp.graphite/force-parallel-4.c scan-tree-dump-times > graphite "2 loops carried no dependency" 1
So it looks like the conditional stores to A[i] in for (i = 0; i < N; i++) { if (i < T) /* loop i1: carried no dependency. */ A[i] = A[i+T]; else /* loop i2: carried dependency. */ A[i] = A[i+T+1]; } vs. the stores commoned in the merge block causes the merge block to appear in the schedule and thus [scheduler] original ast: { for (int c0 = 0; c0 <= 19999; c0 += 1) S_3(c0); - for (int c0 = 0; c0 <= 999; c0 += 1) - S_5(c0); - for (int c0 = 1000; c0 <= 9999; c0 += 1) - S_6(c0); + for (int c0 = 0; c0 <= 9999; c0 += 1) { + if (c0 >= 1000) { + S_6(c0); + } else { + S_5(c0); + } + S_7(c0); + } } [scheduler] AST generated by isl: { for (int c0 = 0; c0 <= 19999; c0 += 1) S_3(c0); - for (int c0 = 0; c0 <= 999; c0 += 1) + for (int c0 = 0; c0 <= 999; c0 += 1) { S_5(c0); - for (int c0 = 1000; c0 <= 9999; c0 += 1) + S_7(c0); + } + for (int c0 = 1000; c0 <= 9999; c0 += 1) { S_6(c0); + S_7(c0); + } } not sure why neither of the two loops end up parallelizable then. Possibly because of the dependences we generate for the cross-BB scalar deps which store and load to a "scalar" memory location (in the end those are just SSA dependences). So it's an artificial limitation caused by how we represent scheduling dependences I guess: +Adding scalar write: _4 +From stmt: _4 = A[_3]; +Adding scalar write: cstore_13 +From stmt: cstore_13 = PHI <_2(5), _4(6)> ... +Adding scalar write: _2 +From stmt: _2 = A[_1]; +Adding scalar write: cstore_13 +From stmt: cstore_13 = PHI <_2(5), _4(6)> ... +Adding scalar read: cstore_13 +From stmt: cstore_13 = PHI <_2(5), _4(6)> I have no idea/plan to improve that on the GRAPHITE/ISL side right now but maybe Sebastian has? We'd need to tell ISL that we can "elide" the dependence or rather that it is inner-iteration only and doesn't leak cross-iteration - kind of clobbering it at the end? So I'm going to XFAIL the testcase. Thanks, Richard.