Hi! As written in the PR, I've been looking why is llvm 3.[34] so much faster on Scimark2 SOR benchmark and the reason is that it's predictive commoning or whatever it uses doesn't give up on the inner loop, while our predcom unnecessarily gives up, because there are reads that could alias the write.
This simple patch improves the benchmark by 42%. We already ignore unsuitable dependencies for read/read, the patch extends that for unsuitable dependencies for read/write by just putting the read (and anything in it's component) into the bad component which is ignored. pcom doesn't optimize away the writes and will keep the potentially aliasing reads unmodified as well. Without the patch we'd merge the two components, and as !determine_offset between the two DRs, it would mean the whole merged component would be always unsuitable and thus ignored. With the patch we'll hopefully have some other reads with known offset to the write and can optimize that, so the patch should always either handle what it did before or handle perhaps some more cases. The inner loop from the (public domain) benchmark is added in the two tests, one runtime test and one test looking whether pcom actually optimized it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-12-31 Jakub Jelinek <ja...@redhat.com> PR tree-optimization/59643 * tree-predcom.c (split_data_refs_to_components): If one dr is read and one write, determine_offset fails and the write isn't in the bad component, just put the read into the bad component. * gcc.dg/pr59643.c: New test. * gcc.c-torture/execute/pr59643.c: New test. --- gcc/tree-predcom.c.jj 2013-12-31 12:50:47.169748385 +0100 +++ gcc/tree-predcom.c 2013-12-31 15:39:44.422297404 +0100 @@ -772,10 +772,37 @@ split_data_refs_to_components (struct lo bad = component_of (comp_father, n); /* If both A and B are reads, we may ignore unsuitable dependences. */ - if (DR_IS_READ (dra) && DR_IS_READ (drb) - && (ia == bad || ib == bad - || !determine_offset (dra, drb, &dummy_off))) - continue; + if (DR_IS_READ (dra) && DR_IS_READ (drb)) + { + if (ia == bad || ib == bad + || !determine_offset (dra, drb, &dummy_off)) + continue; + } + /* If A is read and B write or vice versa and there is unsuitable + dependence, instead of merging both components into a component + that will certainly not pass suitable_component_p, just put the + read into bad component, perhaps at least the write together with + all the other data refs in it's component will be optimizable. */ + else if (DR_IS_READ (dra) && ib != bad) + { + if (ia == bad) + continue; + else if (!determine_offset (dra, drb, &dummy_off)) + { + merge_comps (comp_father, comp_size, bad, ia); + continue; + } + } + else if (DR_IS_READ (drb) && ia != bad) + { + if (ib == bad) + continue; + else if (!determine_offset (dra, drb, &dummy_off)) + { + merge_comps (comp_father, comp_size, bad, ib); + continue; + } + } merge_comps (comp_father, comp_size, ia, ib); } --- gcc/testsuite/gcc.dg/pr59643.c.jj 2013-12-31 15:34:48.584810067 +0100 +++ gcc/testsuite/gcc.dg/pr59643.c 2013-12-31 15:34:48.584810067 +0100 @@ -0,0 +1,15 @@ +/* PR tree-optimization/59643 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-pcom-details" } */ + +void +foo (double *a, double *b, double *c, double d, double e, int n) +{ + int i; + for (i = 1; i < n - 1; i++) + a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i]; +} + +/* { dg-final { scan-tree-dump-times "Before commoning:" 1 "pcom" } } */ +/* { dg-final { scan-tree-dump-times "Unrolling 2 times" 1 "pcom" } } */ +/* { dg-final { cleanup-tree-dump "pcom" } } */ --- gcc/testsuite/gcc.c-torture/execute/pr59643.c.jj 2013-12-31 15:34:48.584810067 +0100 +++ gcc/testsuite/gcc.c-torture/execute/pr59643.c 2013-12-31 15:34:48.584810067 +0100 @@ -0,0 +1,39 @@ +/* PR tree-optimization/59643 */ + +#define N 32 + +__attribute__((noinline, noclone)) void +foo (double *a, double *b, double *c, double d, double e, int n) +{ + int i; + for (i = 1; i < n - 1; i++) + a[i] = d * (b[i] + c[i] + a[i - 1] + a[i + 1]) + e * a[i]; +} + +double expected[] = { + 0.0, 10.0, 44.0, 110.0, 232.0, 490.0, 1020.0, 2078.0, 4152.0, 8314.0, + 16652.0, 33326.0, 66664.0, 133354.0, 266748.0, 533534.0, 1067064.0, + 2134138.0, 4268300.0, 8536622.0, 17073256.0, 34146538.0, 68293116.0, + 136586270.0, 273172536.0, 546345082.0, 1092690188.0, 2185380398.0, + 4370760808.0, 8741521642.0, 17483043324.0, 6.0 +}; + +int +main () +{ + int i; + double a[N], b[N], c[N]; + if (__DBL_MANT_DIG__ <= 35) + return 0; + for (i = 0; i < N; i++) + { + a[i] = (i & 3) * 2.0; + b[i] = (i & 7) - 4; + c[i] = i & 7; + } + foo (a, b, c, 2.0, 3.0, N); + for (i = 0; i < N; i++) + if (a[i] != expected[i]) + __builtin_abort (); + return 0; +} Jakub