Hi!

OK to commit (something like) the following?  Should something be added
to the "News" section on <https://gcc.gnu.org/> itself?  (I don't know
the policy for that.  We didn't suggest that for GCC 5, because at that
time we described the support as a "preliminary implementation of the
OpenACC 2.0a specification"; now it's much more complete and usable.)

Index: htdocs/gcc-6/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.74
diff -u -p -r1.74 changes.html
--- htdocs/gcc-6/changes.html   19 Apr 2016 11:13:02 -0000      1.74
+++ htdocs/gcc-6/changes.html   21 Apr 2016 16:10:49 -0000
@@ -124,6 +124,52 @@ For more information, see the
 <!-- .................................................................. -->
 <h2 id="languages">New Languages and Language specific improvements</h2>
 
+<!-- <ul>
+  <li> -->Compared to GCC 5, the GCC 6 release series includes a much improved
+    implementation of the <a href="http://www.openacc.org/";>OpenACC 2.0a
+      specification</a>.  Highlights are:
+    <ul>
+      <li>In addition to single-threaded host-fallback execution, offloading is
+       supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit
+       little-endian GNU/Linux host systems.  For nvptx offloading, with the
+       OpenACC parallel construct, the execution model allows for an arbitrary
+       number of gangs, up to 32 workers, and 32 vectors.</li>
+      <li>Initial support for parallelized execution of OpenACC kernels
+       constructs:
+       <ul>
+         <li>Parallelization of a kernels region is switched on
+           by <code>-fopenacc</code> combined with <code>-O2</code> or
+           higher.</li>
+         <li>Code will be offloaded onto multiple gangs, but executes with
+           just one worker, and a vector length of 1.</li>
+         <li>Directives inside a kernels region are not supported.</li>
+         <li>Loops with reductions can be parallelized.</li>
+         <li>Only kernels regions with one loop nest are parallelized.</li>
+         <li>Only the outer-most loop of a loop nest can be parallelized.</li>
+         <li>Loop nests containing sibling loops are not parallelized.</li>
+       </ul>
+       Typically, using the OpenACC parallel construct will give much better
+       performance, compared to the initial support of the OpenACC kernels
+       construct.
+      <li>The <code>device_type</code> clause is not supported.
+       The <code>bind</code> and <code>nohost</code> clauses are not
+       supported.  The <code>host_data</code> directive is not supported in
+       Fortran.</li>
+      <li>Nested parallelism (cf. CUDA dynamic parallelism) is not
+       supported.</li>
+      <li>Usage of OpenACC constructs inside multithreaded contexts (such as
+       created by OpenMP, or pthread programming) is not supported.</li>
+      <li>If a call to the <code>acc_on_device</code> function has a
+       compile-time constant argument, the function call evaluates to a
+       compile-time constant value only for C and C++ but not for
+       Fortran.</li>
+    </ul>
+    See the <a href="https://gcc.gnu.org/wiki/OpenACC";>OpenACC</a>
+    and <a href="https://gcc.gnu.org/wiki/Offloading";>Offloading</a> wiki pages
+    for further information.
+  <!-- </li>
+</ul> -->
+
 <!-- <h3 id="ada">Ada</h3> -->
 
 <h3 id="c-family">C family</h3>


Grüße
 Thomas

Attachment: signature.asc
Description: PGP signature

Reply via email to