On 24-08-15 16:28, Tom de Vries wrote:
On 24-08-15 11:43, Jakub Jelinek wrote:
On Mon, Jul 28, 2014 at 11:21:53AM +0200, Tom de Vries wrote:
Jakub,
we're using expand_omp_for_static_chunk with a chunk_size of one to expand the
openacc loop construct.
This results in an inner and outer loop being generated, with the inner loop
having a trip count of one, which means that the inner loop can be simplified to
just the inner loop body. However, subsequent optimizations do not manage to do
this simplification.
This patch sets the loop exit condition to true if the chunk_size is one, to
ensure that the compiler will optimize away the inner loop.
OK for gomp4 branch?
Thanks,
- Tom
2014-07-25 Tom de Vries <t...@codesourcery.com>
* omp-low.c (expand_omp_for_static_chunk): Remove inner loop if
chunk_size is one.
If that is still the case on the trunk, the patch is ok for trunk after
retesting it. Please mention the PR tree-optimization/65468 in the
ChangeLog entry and make sure there is some runtime testcase that tests
that code path (both OpenMP and OpenACC one).
Committed attached patch to trunk.
I'll look into openacc testcase for trunk.
Committed as attached.
Thanks,
- Tom
Add libgomp.oacc-c-c++-common/vector-loop.c
2015-08-24 Tom de Vries <t...@codesourcery.com>
PR tree-optimization/65468
* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: New test.
---
.../libgomp.oacc-c-c++-common/vector-loop.c | 33 ++++++++++++++++++++++
1 file changed, 33 insertions(+)
create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
new file mode 100644
index 0000000..cc915a9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+#define N 1024
+
+unsigned int a[N];
+unsigned int b[N];
+unsigned int c[N];
+unsigned int n = N;
+
+int
+main (void)
+{
+ for (unsigned int i; i < n; ++i)
+ {
+ a[i] = i % 3;
+ b[i] = i % 5;
+ }
+
+#pragma acc parallel vector_length (32) copyin (a,b) copyout (c)
+ {
+#pragma acc loop /* vector clause is missing, since it's not yet supported. */
+ for (unsigned int i = 0; i < n; i++)
+ c[i] = a[i] + b[i];
+ }
+
+ for (unsigned int i; i < n; ++i)
+ if (c[i] != (i % 3) + (i % 5))
+ abort ();
+
+ return 0;
+}
--
1.9.1