[PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-06 Thread frederik
From: frederik 

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
warnings if it is violated.

2019-11-06  Gergö Barany  
    Frederik Harwath  
Thomas Schwinge  

gcc/
* omp-low.c (struct omp_context): New fields
local_reduction_clauses, outer_reduction_clauses.
(new_omp_context): Initialize these.
(scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
(scan_omp_for): Check reduction clauses for incorrect nesting.
gcc/testsuite/
* c-c++-common/goacc/nested-reductions-warn.c: New test.
* c-c++-common/goacc/nested-reductions.c: New test.
* gfortran.dg/goacc/nested-reductions-warn.f90: New test.
* gfortran.dg/goacc/nested-reductions.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
Add expected warnings about missing reduction clauses.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
Likewise.

Reviewed-by: Thomas Schwinge 



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277875 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |  10 +
 gcc/omp-low.c |  97 +++
 gcc/testsuite/ChangeLog   |   9 +
 .../goacc/nested-reductions-warn.c| 525 ++
 .../c-c++-common/goacc/nested-reductions.c| 420 +++
 .../goacc/nested-reductions-warn.f90  | 674 ++
 .../gfortran.dg/goacc/nested-reductions.f90   | 540 ++
 libgomp/ChangeLog |  11 +
 .../par-loop-comb-reduction-1.c   |   2 +-
 .../par-loop-comb-reduction-2.c   |   2 +-
 .../par-loop-comb-reduction-3.c   |   2 +-
 .../par-loop-comb-reduction-4.c   |   2 +-
 12 files changed, 2290 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-warn.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7fee0f37e9bf..38160dd631e9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2019-11-06  Gergö Barany  
+   Frederik Harwath  
+   Thomas Schwinge  
+
+   * omp-low.c (struct omp_context): New fields
+   local_reduction_clauses, outer_reduction_clauses.
+   (new_omp_context): Initialize these.
+   (scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
+   (scan_omp_for): Check reduction clauses for incorrect nesting.
+   
 2019-11-06  Jakub Jelinek  
 
PR inline-asm/92352
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 122f42788813..fa76ceba33c6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -128,6 +128,12 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->outer = outer_ctx;
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
   ctx->depth = outer_ctx->depth + 1;
 }
   else
@@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = NULL;
   ctx->depth = 1;
 }
 
@@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  goto do_pri

[PATCH][COMMITTED] Add OpenACC 2.6 `serial' construct support

2019-11-12 Thread frederik
Hi,
the following patch has been reviewed and committed.

Frederik

--- 8< --

The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.

The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.

Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between compute
constructs in warnings, error messages, dumps etc.

2019-11-12  Maciej W. Rozycki  
Tobias Burnus  
Frederik Harwath  
Thomas Schwinge  

gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant.  Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.

* doc/generic.texi (OpenACC): Document OACC_SERIAL.

gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.

gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.

gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL.  Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.

gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL.
(gfc_ascii_statement): Likewise.  Handle ST_OACC_END_SERIAL_LOOP
and ST_OACC_END_SERIAL.
(parse_oacc_structured_block): Handle ST_OACC_SERIAL.
(parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_END_SERIAL_LOOP.
(parse_executable): Handle ST_OACC_SERIAL_LOOP and
ST_OACC_SERIAL.
(is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* resolve.c (gfc

[PATCH 1/4] openmp: Fix loop transformation tests

2023-07-28 Thread Frederik Harwath
libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction 
clause.
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize 
var.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add 
reduction
and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j

 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions

 integer :: i,j

+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git 
a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions

 integer :: i,j

-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 0/4] openmp: loop transformation fixes

2023-07-28 Thread Frederik Harwath
Hi,
the following patches contain some fixes from the devel/omp/gcc-13 branch
to the patches that implement the OpenMP 5.1. loop transformation directives
which I have posted in March 2023.

Frederik



Frederik Harwath (4):
  openmp: Fix loop transformation tests
  openmp: Fix initialization for 'unroll full'
  openmp: Fix diagnostic message for "omp unroll"
  openmp: Fix number of iterations computation for "omp unroll full"

 gcc/omp-transform-loops.cc| 99 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 ++
 .../gomp/loop-transforms/unroll-8.f90 |  2 +-
 .../gomp/loop-transforms/unroll-9.f90 |  2 +-
 .../matrix-no-directive-unroll-full-1.C   | 13 +++
 .../loop-transforms/matrix-no-directive-1.c   |  2 +-
 .../matrix-no-directive-unroll-full-1.c   |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c|  2 +
 .../loop-transforms/matrix-omp-for-1.c|  2 +-
 .../matrix-omp-parallel-for-1.c   |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c   |  2 +
 ...trix-omp-parallel-masked-taskloop-simd-1.c |  2 +
 .../matrix-omp-target-parallel-for-1.c|  2 +-
 ...p-target-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-omp-taskloop-1.c   |  2 +
 ...trix-omp-teams-distribute-parallel-for-1.c |  2 +
 .../loop-transforms/matrix-simd-1.c   |  2 +
 .../loop-transforms/unroll-1.c|  8 +-
 .../loop-transforms/unroll-non-rect-1.c   |  2 +
 .../loop-transforms/tile-2.f90|  2 +-
 .../loop-transforms/unroll-1.f90  |  2 +
 .../loop-transforms/unroll-6.f90  |  4 +-
 .../loop-transforms/unroll-simd-1.f90 |  3 +-
 23 files changed, 197 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (gomp_for_number_of_iterations):
Always compute "final - init" and do not take absolute value.
Identify non-iterating and infinite loops for constant init,
final, step values for better diagnostic messages, consistent
behaviour in those corner cases, and better testability.
(gomp_for_constant_iterations_p): Add new argument to pass
on information about infinite loops, and ...
(full_unroll): ... use from here to emit a warning and remove
unrolled, known infinite loops consistently.
(process_omp_for): Only print dump message if loop has not
been removed by transformation.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/loop-transforms/unroll-8.c: New test.
---
 gcc/omp-transform-loops.cc| 94 ++-
 .../gomp/loop-transforms/unroll-8.c   | 76 +++
 2 files changed, 146 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index c8853bcee89..b0645397641 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -153,20 +153,27 @@ subst_defs (tree expr, gimple_seq seq)
   return expr;
 }

-/* Return an expression for the number of iterations of the outermost loop of
-   OMP_FOR. */
+/* Return an expression for the number of iterations of the loop at
+   the given LEVEL of OMP_FOR.
+
+   If the expression is a negative constant, this means that the loop
+   is infinite. This can only be recognized for loops with constant
+   initial, final, and step values.  In general, according to the
+   OpenMP specification, the behaviour is unspecified if the number of
+   iterations does not fit the types used for their computation, and
+   hence in particular if the loop is infinite. */

 tree
 gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level)
 {
   gcc_assert (!non_rectangular_p (omp_for));
-
   tree init = gimple_omp_for_initial (omp_for, level);
   tree final = gimple_omp_for_final (omp_for, level);
   tree_code cond = gimple_omp_for_cond (omp_for, level);
   tree index = gimple_omp_for_index (omp_for, level);
   tree type = gomp_for_iter_count_type (index, final);
-  tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1);
+  tree incr = gimple_omp_for_incr (omp_for, level);
+  tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr);

   init = subst_defs (init, gimple_omp_for_pre_body (omp_for));
   init = fold (init);
@@ -181,34 +188,64 @@ gomp_for_number_of_iterations (const gomp_for *omp_for, 
size_t level)
   diff_type = ptrdiff_type_node;
 }

-  tree diff;
-  if (cond == GT_EXPR)
-diff = fold_build2 (minus_code, diff_type, init, final);
-  else if (cond == LT_EXPR)
-diff = fold_build2 (minus_code, diff_type, final, init);
-  else
-gcc_unreachable ();

-  diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step);
-  diff = fold_build1 (ABS_EXPR, type, diff);
+  /* Identify a simple case in which the loop does not iterate. The
+ computation below could not tell this apart from an infinite
+ loop, hence we handle this separately for better diagnostic
+ messages. */
+  gcc_assert (cond == GT_EXPR || cond == LT_EXPR);
+  if (TREE_CONSTANT (init) && TREE_CONSTANT (final)
+  && ((cond == GT_EXPR && tree_int_cst_le (init, final))
+ || (cond == LT_EXPR && tree_int_cst_le (final, init
+return build_int_cst (diff_type, 0);
+
+  tree diff = fold_build2 (minus_code, diff_type, final, init);
+
+  /* Divide diff by the step.
+
+ We could always use CEIL_DIV_EXPR since only non-negative results
+ correspond to valid number of iterations and the behaviour is
+ unspecified by the spec otherwise. But we try to get the rounding
+ right for constant negative values to identify infinite loops
+ more precisely for better warnings. */
+  tree_code div_expr = CEIL_DIV_EXPR;
+  if (TREE_CONSTANT (diff) && TREE_CONSTANT (step))
+{
+  bool diff_is_neg = tree_int_cst_lt (diff, size_zero_node);
+  bool step_is_neg = tree_int_cst_lt (step, size_zero_node);
+  if ((diff_is_neg && !step_is_neg)
+ || (!diff_is_neg && step_is_neg))
+   div_expr = FLOOR_DIV_EXPR;
+}

+  diff = fold_build2 (div_expr, type, diff, step);
   return diff;
 }

-/* Return true if the expression representing the number of iterations for
-   OMP_FOR is a constant expression, false otherwise. */
+/* Return true if the expression representing the number of iterations
+   for OMP_FOR is a non-negative constant and set ITERATIONS to the
+   value of that expression. Otherwise, return false.  Set INFINITE to
+   true if the number of iterations was recognized to be infinite. */

 bool
 gomp_for_constant_iterations_p (gomp_for *omp_for,
-   unsigned HOST_WIDE_INT *iterations)
+ 

[PATCH 3/4] openmp: Fix diagnostic message for "omp unroll"

2023-07-28 Thread Frederik Harwath
gcc/ChangeLog:

* omp-transform-loops.cc (print_optimized_unroll_partial_msg):
Output "omp unroll partial" instead of "omp unroll auto".
(optimize_transformation_clauses): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
---
 gcc/omp-transform-loops.cc| 4 ++--
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90   | 2 +-
 .../testsuite/libgomp.fortran/loop-transforms/unroll-6.f90| 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 275a5260dae..c8853bcee89 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -1423,7 +1423,7 @@ print_optimized_unroll_partial_msg (tree c)
   tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c);
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc,
   "replaced consecutive % directives by "
-  "%\n", tree_to_uhwi (unroll_factor));
 }

@@ -1483,7 +1483,7 @@ optimize_transformation_clauses (tree clauses)

  dump_printf_loc (
  MSG_OPTIMIZED_LOCATIONS, dump_loc,
- "removed useless % directives "
+ "removed useless % directives "
  "preceding 'omp unroll full'\n");
}
}
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
index fd687890ee6..dab3f0fb5cf 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90
@@ -5,7 +5,7 @@ subroutine test1
   implicit none
   integer :: i
   !$omp parallel do collapse(1)
-  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll auto\(24\)'} }
+  !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' 
directives by 'omp unroll partial\(24\)'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 
b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
index 928ca44e811..91e13ff1b37 100644
--- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90
@@ -4,7 +4,7 @@
 subroutine test1
   implicit none
   integer :: i
-  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+  !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' 
directives preceding 'omp unroll full'} }
   !$omp unroll partial(3)
   !$omp unroll partial(2)
   !$omp unroll partial(1)
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 
b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
index 1df8ce8d5bb..b953ce31b5b 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90
@@ -22,7 +22,7 @@ contains

 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
-!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll auto\(50\)'} }
+!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp 
unroll' directives by 'omp unroll partial\(50\)'} }
 !$omp unroll partial(10)
 do i = 1,n,step
sum = sum + 1
@@ -36,7 +36,7 @@ contains
 sum = 0
 !$omp parallel do reduction(+:sum) lastprivate(i)
 do i = 1,n,step
-   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' 
directives preceding 'omp unroll full'} }
+   !$omp unroll full ! { dg-optimized {removed useless 'omp unroll 
partial' directives preceding 'omp unroll full'} }
!$omp unroll partial(10)
do j = 1, 1000
   sum = sum + 1
--
2.36.1

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 2/4] openmp: Fix initialization for 'unroll full'

2023-07-28 Thread Frederik Harwath
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.

Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.

gcc/ChangeLog:

* omp-transform-loops.cc (full_unroll): Add initialization of index 
variable.

libgomp/ChangeLog:

* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty 
pragmas.
Use -O2.
* 
testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of 
testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use 
-Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* 
testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
---
 gcc/omp-transform-loops.cc  |  1 +
 .../matrix-no-directive-unroll-full-1.C | 13 +
 .../loop-transforms/matrix-no-directive-1.c |  2 +-
 .../matrix-no-directive-unroll-full-1.c |  2 +-
 .../matrix-omp-distribute-parallel-for-1.c  |  2 ++
 .../loop-transforms/matrix-omp-for-1.c  |  2 +-
 .../loop-transforms/matrix-omp-parallel-for-1.c |  2 +-
 .../matrix-omp-parallel-masked-taskloop-1.c |  2 ++
 .../matrix-omp-parallel-masked-taskloop-simd-1.c|  2 ++
 .../matrix-omp-target-parallel-for-1.c  |  2 +-
 ...rix-omp-target-teams-distribute-parallel-for-1.c |  2 ++
 .../loop-transforms/matrix-omp-taskloop-1.c |  2 ++
 .../matrix-omp-teams-distribute-parallel-for-1.c|  2 ++
 .../loop-transforms/matrix-simd-1.c |  2 ++
 .../libgomp.c-c++-common/loop-transforms/unroll-1.c |  8 +---
 .../loop-transforms/unroll-non-rect-1.c |  2 ++
 16 files changed, 40 insertions(+), 8 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C

diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc
index 517faea537c..275a5260dae 100644
--- a/gcc/omp-transform-loops.cc
+++ b/gcc/omp-transform-loops.cc
@@ -548,6 +548,7 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx 
*ctx ATTRIBUTE_UNUSED)

   gimple_seq unrolled = NULL;
   gimple_seq_add_seq (&unrolled, gimple_omp_for_pre_body (omp_for));
+  gimplify_assign (index, init, &unrolled);
   push_gimplify_context ();
   gimple_seq_add_seq (&unrolled,
  build_unroll_body (body, unroll_factor, index, incr));
diff --git 
a/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
new file mode 100644
index 000..3a684219627
--- /dev/null
+++ 
b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C
@@ -0,0 +1,13 @@
+/* { dg-additional-options { -O0 -fdump-tree-original -Wall 
-Wno-unknown-pragmas } } */
+
+#define COMMON_DIRECTIVE
+#define COMMON_TOP_TRANSFORM omp unroll full
+#define COLLAPSE_1
+#define COLLAPSE_2
+#define COLLAPSE_3
+#define IMPLEMENTATION_FILE 
"../../libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h"
+
+#include 
"../../libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h"
+
+/* A consistency check to prevent broken macro usage. */
+/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */
diff --git 
a/libgomp/testsuite/libgomp.c-c++-common/loop-tra

[OG11][committed][PATCH 00/22] OpenACC "kernels" Improvements

2021-11-17 Thread Frederik Harwath
Hi,

this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  The
central step is contained in the commit titled "openacc: Use Graphite
for dependence analysis in \"kernels\" regions" whose commit message
also contains further explanations.

Best regards,
Frederik

PS: The commit series also includes a backport from master
"00b98b6cac25 Add dg-final option-based target selectors" and two
trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg
declaration" and "35cdc94463fe Fix branch prediction dump message"



Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime alias checking for OpenACC kernels

Frederik Harwath (19):
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Fix branch prediction dump message
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Add further kernels tests
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Adjust test expectations to new "kernels" handling

Sandra Loosemore (1):
  Fortran: delinearize multi-dimensional array accesses

 gcc/Makefile.in   |2 +
 gcc/cfgloop.c |1 +
 gcc/cfgloop.h |6 +
 gcc/cfgloopmanip.c|1 +
 gcc/common.opt|9 +
 gcc/config/nvptx/nvptx.c  |7 +
 gcc/doc/gimple.texi   |2 +
 gcc/doc/invoke.texi   |   20 +-
 gcc/doc/passes.texi   |6 +-
 gcc/expr.c|1 +
 gcc/flag-types.h  |1 +
 gcc/fortran/lang.opt  |4 +
 gcc/fortran/trans-array.c |  321 --
 gcc/gimple-loop-interchange.cc|2 +-
 gcc/gimple-pretty-print.c |3 +
 gcc/gimple-walk.c |   15 +-
 gcc/gimple-walk.h |6 +
 gcc/gimple.h  |7 +-
 gcc/gimplify.c|   13 +-
 gcc/graph.c   |   35 +-
 gcc/graphite-dependences.c|  220 +++-
 gcc/graphite-isl-ast-to-gimple.c  |  271 -
 gcc/graphite-oacc.c   |  689 
 gcc/graphite-oacc.h   |   55 +
 gcc/graphite-optimize-isl.c   |   42 +-
 gcc/graphite-poly.c   |   41 +-
 gcc/graphite-scop-detection.c |  654 +--
 gcc/graphite-sese-to-poly.c   |   90 +-
 gcc/graphite.c|  120 +-
 gcc/graphite.h|   40 +-
 gcc/internal-fn.c |2 +
 gcc/internal-fn.h |4 +-
 gcc/omp-data-optimize.cc  |  951 
 gcc/omp-expand.c  |  110 +-
 gcc/omp-general.c |   23 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.c |  321 +-
 gcc/omp-oacc-kernels-decompose.cc |  145 ++-
 gcc/omp-offload.c | 1001 +
 gcc/omp-offload.h |2 +
 gcc/params.opt|5 +-
 gcc/passes.c  |   42 +
 gcc/passes.def|   47 +-
 gcc/predict.c |2 +-
 gcc/sese.c|   25 +-
 gcc/sese.h|   19 +
 gcc/testsuite/c-c++-common/goacc/acc-icf.c|4 +-
 gcc/testsuite/c-c++-common/goacc/cache-3-1.c  |2 +-
 ...classify-kernels-unparallelized-graphite.c |   41 +
 ...lassify-kernels-unparallelized-parloops.c} |   12 +-
 .../c-c++-common/goacc/classify-kernels.c |   27 +-
 .

[OG11][committed][PATCH 01/22] Fortran: delinearize multi-dimensional array accesses

2021-11-17 Thread Frederik Harwath
From: Sandra Loosemore 

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

gcc/
* expr.c (get_inner_reference): Handle NOP_EXPR like
VIEW_CONVERT_EXPR.

gcc/fortran/
* lang.opt (-param=delinearize=): New.
* trans-array.c (get_class_array_vptr): New, split from...
(build_array_ref): ...here.
(get_array_lbound, get_array_ubound): New, split from...
(gfc_conv_array_ref): ...here.  Additional code refactoring
plus support for delinearization of the array access.

gcc/testsuite/
* gfortran.dg/assumed_type_2.f90: Adjust patterns.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/graphite/block-3.f90: Remove xfails.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Adjust patterns.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus  
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   1 -
 .../gfortran.dg/graphite/block-4.f90  |   1 -
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 13 files changed, 264 insertions(+), 95 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index 21b7e96ed62e..c7ee800c4d4f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7539,6 +7539,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index dba333448c11..1548d56278a4 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index b7d949929722..3eb9a1778173 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank N of array DECL which might
+   be either a bare array or a descriptor.  This differs from
+   gfc_conv_array_lbound because it gets information for temporary array
+   objects from AR instead of the descriptor (they can differ).  */
+
+static tree
+get_array_lbound (tree decl, int n, gfc_symbol *sym,
+   

[OG11][committed][PATCH 03/22] graphite: Extend SCoP detection dump output

2021-11-17 Thread Frederik Harwath
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

ChangeLog:

* graphite-scop-detection.c (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
---
 gcc/graphite-scop-detection.c | 188 +-
 1 file changed, 165 insertions(+), 23 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 3e729b159b09..46c470210d05 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;

 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) 
const
   return combined;
 }

+/* Print the loop numbers of the loops contained
+   in SESE to FILE. */
+
+static void
+print_sese_loop_numbers (FILE *file, sese_l sese)
+{
+  loop_p loop;
+  bool printed = false;
+  FOR_EACH_LOOP (loop, 0)
+  {
+if (loop_in_sese_p (loop, sese))
+  fprintf (file, "%d, ", loop->num);
+printed = true;
+  }
+  if (printed)
+fprintf (file, "\b\b");
+}
+
 /* Build scop outer->inner if possible.  */

 void
@@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop)
   if (! next
  || harmful_loop_in_region (next))
{
- if (s)
-   add_scop (s);
+  if (next)
+DEBUG_PRINT (
+dp << "[scop-detection] Discarding SCoP on loops ";
+print_sese_loop_numbers (dump_file, next);
+dp << " because of harmful loops\n";);
+  if (s)
+add_scop (s);
  build_scop_depth (loop);
  s = invalid_sese;
}
@@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l 
scop)
   || !single_pred_p (loop->latch)
   || exit->src != single_pred (loop->latch)
   || !empty_block_p (loop->latch))
-return false;
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape 
unsupported.\n");
+  return false;
+}
+
+  bool edge_irreducible
+  = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP;
+  if (edge_irreducible)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop is not a natural loop.\n");
+  return false;
+}
+
+  bool niter_is_unconditional = number_of_iterations_exit (loop,
+  single_exit (loop),
+  &niter_desc, false);

-  return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP)
-&& number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false)
-&& niter_desc.control.no_overflow
-&& (niter = number_of_latch_executions (loop))
-&& !chrec_contains_undetermined (niter)
-&& graphite_can_represent_expr (scop, loop, niter);
+  if (!niter_is_unconditional)
+{
+  DEBUG_PRINT (
+  dp << "[can_represent_loop-fail] Loop niter not unconditional.\n"
+ << "Condition: " << niter_desc.assumptions << "\n");
+  return false;
+}
+
+  niter = number_of_latch_executions (loop);
+  if (!niter)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n");
+  return false;
+}
+  if (!niter_desc.control.no_overflow)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can 
overflow.\n");
+  return false;
+}
+
+  bool undetermined_coefficients = chrec_contains_undetermined (niter);
+  if (undetermined_coefficients)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter chrec contains undetermined coefficients.\n");
+  return false;
+}
+
+  bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter);
+  if (!can_represent_expr)
+{
+  DEBUG_PRINT (dp << "[can_represent_loop-fail] "
+  << "Loop niter expression cannot be represented: "
+  << niter << "\n");
+  return false;
+}
+
+  ret

[OG11][committed][PATCH 02/22] openacc: Move pass_oacc_device_lower after pass_graphite

2021-11-17 Thread Frederik Harwath
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.c (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.c (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.c (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.c (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.c (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
* to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/omp-general.c |  8 +-
 gcc/omp-offload.c |  8 ++
 gcc/passes.c  | 42 
 gcc/passes.def  

[OG11][committed][PATCH 04/22] graphite: Rename isl_id_for_ssa_name

2021-11-17 Thread Frederik Harwath
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name.  It also
explains the prefix "P_" for those names in the ISL representation.

gcc/ChangeLog:

* graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ...
  (isl_id_for_parameter): ... this new function name.
  (build_scop_context): Adjust function use.
---
 gcc/graphite-sese-to-poly.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index eebf2e02cfca..195851cb540a 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take 
isl_space *space)
   return isl_pw_aff_mul (lhs, rhs);
 }

-/* Return an isl identifier from the name of the ssa_name E.  */
+/* Return an isl identifier for the parameter P.  */

 static isl_id *
-isl_id_for_ssa_name (scop_p s, tree e)
+isl_id_for_parameter (scop_p s, tree p)
 {
-  char name1[14];
-  snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e));
-  return isl_id_alloc (s->isl_context, name1, e);
+  gcc_checking_assert (TREE_CODE (p) == SSA_NAME);
+  char name[14];
+  snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p));
+  return isl_id_alloc (s->isl_context, name, p);
 }

 /* Return an isl identifier for the data reference DR.  Data references and
@@ -893,15 +894,15 @@ build_scop_context (scop_p scop)
   isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0);

   unsigned i;
-  tree e;
-  FOR_EACH_VEC_ELT (region->params, i, e)
+  tree p;
+  FOR_EACH_VEC_ELT (region->params, i, p)
 space = isl_space_set_dim_id (space, isl_dim_param, i,
-  isl_id_for_ssa_name (scop, e));
+  isl_id_for_parameter (scop, p));

   scop->param_context = isl_set_universe (space);

-  FOR_EACH_VEC_ELT (region->params, i, e)
-add_param_constraints (scop, i, e);
+  FOR_EACH_VEC_ELT (region->params, i, p)
+add_param_constraints (scop, i, p);
 }

 /* Return true when loop A is nested in loop B.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 05/22] graphite: Fix minor mistakes in comments

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and
  a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo
  in comment.
---
 gcc/graphite-isl-ast-to-gimple.c | 2 +-
 gcc/graphite-sese-to-poly.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c202213f39b3..44c06016f1a2 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq)
   basic_block begin_bb = get_entry_bb (codegen_region);

   /* Inserting the gimple statements in a vector because gimple_seq behave
- in strage ways when inserting the stmts from it into different basic
+ in strange ways when inserting the stmts from it into different basic
  blocks one at a time.  */
   auto_vec stmts;
   for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi);
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 195851cb540a..12fa2d669b3c 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -644,14 +644,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, 
enum poly_dr_type kind,
 isl_map *acc, isl_set *subscript_sizes)
 {
   scop_p scop = PBB_SCOP (pbb);
-  /* Each scalar variables has a unique alias set number starting from
+  /* Each scalar variable has a unique alias set number starting from
  the maximum alias set assigned to a dr.  */
   int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var);
   subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0,
alias_set);

   /* Add a constrain to the ACCESSES polyhedron for the alias set of
- data reference DR.  */
+ the reference */
   isl_constraint *c
 = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space 
(acc)));
   c = isl_constraint_set_constant_si (c, -alias_set);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 07/22] Move compute_alias_check_pairs to tree-data-ref.c

2021-11-17 Thread Frederik Harwath
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.c (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.c (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.
---
 gcc/tree-data-ref.c  | 87 
 gcc/tree-data-ref.h  |  3 ++
 gcc/tree-loop-distribution.c | 87 
 3 files changed, 90 insertions(+), 87 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index d04e95f7c285..71f8d790e618 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -2645,6 +2645,93 @@ create_intersect_range_checks (class loop *loop, tree 
*cond_expr,
 dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }

+/* Compute and return an expression whose value is the segment length which
+   will be accessed by DR in NITERS iterations.  */
+
+static tree
+data_ref_segment_size (struct data_reference *dr, tree niters)
+{
+  niters = size_binop (MINUS_EXPR,
+  fold_convert (sizetype, niters),
+  size_one_node);
+  return size_binop (MULT_EXPR,
+fold_convert (sizetype, DR_STEP (dr)),
+fold_convert (sizetype, niters));
+}
+
+/* Return true if LOOP's latch is dominated by statement for data reference
+   DR.  */
+
+static inline bool
+latch_dominated_by_data_ref (class loop *loop, data_reference *dr)
+{
+  return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src,
+gimple_bb (DR_STMT (dr)));
+}
+
+/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's
+   data dependence relations ALIAS_DDRS.  */
+
+void
+compute_alias_check_pairs (class loop *loop, vec *alias_ddrs,
+  vec *comp_alias_pairs)
+{
+  unsigned int i;
+  unsigned HOST_WIDE_INT factor = 1;
+  tree niters_plus_one, niters = number_of_latch_executions (loop);
+
+  gcc_assert (niters != NULL_TREE && niters != chrec_dont_know);
+  niters = fold_convert (sizetype, niters);
+  niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Creating alias check pairs:\n");
+
+  /* Iterate all data dependence relations and compute alias check pairs.  */
+  for (i = 0; i < alias_ddrs->length (); i++)
+{
+  ddr_p ddr = (*alias_ddrs)[i];
+  struct data_reference *dr_a = DDR_A (ddr);
+  struct data_reference *dr_b = DDR_B (ddr);
+  tree seg_length_a, seg_length_b;
+
+  if (latch_dominated_by_data_ref (loop, dr_a))
+   seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
+  else
+   seg_length_a = data_ref_segment_size (dr_a, niters);
+
+  if (latch_dominated_by_data_ref (loop, dr_b))
+   seg_length_b = data_ref_segment_size (dr_b, niters_plus_one);
+  else
+   seg_length_b = data_ref_segment_size (dr_b, niters);
+
+  unsigned HOST_WIDE_INT access_size_a
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a;
+  unsigned HOST_WIDE_INT access_size_b
+   = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b;
+  unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a)));
+  unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b)));
+
+  dr_with_seg_len_pair_t dr_with_seg_len_pair
+   (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
+dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+/* ??? Would WELL_ORDERED be safe?  */
+dr_with_seg_len_pair_t::REORDERED);
+
+  comp_alias_pairs->safe_push (dr_with_seg_len_pair);
+}
+
+  if (tree_fits_uhwi_p (niters))
+factor = tree_to_uhwi (niters);
+
+  /* Prune alias check pairs.  */
+  prune_runtime_alias_test_list (comp_alias_pairs, factor);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file,
+"Improved number of alias checks from %d to %d\n",
+alias_ddrs->length (), comp_alias_pairs->length ());
+}
+
 /* Create a conditional expression that represents the run-time checks for
overlapping of address ranges represented by a list of data references
pairs passed in ALIAS_PAIRS.  Data references are in LOOP.  The returned
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index 8001cc54f518..5016ec926b1d 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -577,6 +577,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop 
*, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec 

[OG11][committed][PATCH 08/22] graphite: Add runtime alias checking

2021-11-17 Thread Frederik Harwath
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.c
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.c (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.
gcc/testsuite/ChangeLog:

* gcc.dg/graphite/alias-1.c: New test.
---
 gcc/common.opt  |   4 +
 gcc/graphite-isl-ast-to-gimple.c|  60 ++
 gcc/graphite-poly.c |   2 +
 gcc/graphite-scop-detection.c   | 239 +---
 gcc/graphite.h  |   4 +
 gcc/sese.h  |  18 ++
 gcc/testsuite/gcc.dg/graphite/alias-1.c |  22 +++
 7 files changed, 326 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 771398bc03de..aa695e56dc48 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1636,6 +1636,10 @@ fgraphite-identity
 Common Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-runtime-alias-checks
+Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 44c06016f1a2..caa0160b9bce 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry,
 }
 }

+/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of
+   aliasing. */
+
+static tree
+generate_alias_cond (vec &alias_ddrs, loop_p context_loop)
+{
+  gcc_checking_assert (flag_graphite_runtime_alias_checks
+   && alias_ddrs.length () > 0);
+  gcc_checking_assert (context_loop);
+
+  auto_vec check_pairs;
+  compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs);
+  gcc_checking_assert (check_pairs.length () > 0);
+
+  tree alias_cond = NULL_TREE;
+  create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond);
+  gcc_checking_assert (alias_cond);
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+{
+  fprintf (dump_file, "Generated runtime alias check: ");
+  print_generic_expr (dump_file, alias_cond, dump_flags);
+  fprintf (dump_file, "\n");
+}
+
+  return alias_cond;
+}
+
 /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP.
Return true if code generation succeeded.  */

@@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop)
   region->if_region = if_region;

   loop_p context_loop = region->region.entry->src->loop_father;
+  gcc_checking_assert (context_loop);
   edge e = single_succ_edge (if_region->true_region->region.entry->dest);
   basic_block bb = split_edge (e);

   /* Update the true_region exit edge.  */
   region->if_region->true_region->region.exit = single_succ_edge (bb);

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  /* SCoP detection has failed to handle the aliasing between some data
+references of the SCoP statically. Generate an alias check that selects
+the newly generated version of the SCoP in the true-branch of the
+conditional if aliasing can be ruled out at runtime and the original
+version of the SCoP, otherwise. */
+
+  loop_p loop
+  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
+  scop->scop_info->region.exit->src->loop_father);
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
+  tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
+  set_ifsese_condition (re

[OG11][committed][PATCH 10/22] openacc: Add "can_be_parallel" flag info to "graph" dumps

2021-11-17 Thread Frederik Harwath
gcc/ChangeLog:

* graph.c (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
---
 gcc/graph.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/gcc/graph.c b/gcc/graph.c
index ce8de33ffe10..3ad07be3b309 100644
--- a/gcc/graph.c
+++ b/gcc/graph.c
@@ -191,6 +191,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct 
function *fun)
 }
 }

+
+extern tree oacc_get_fn_attrib (tree);
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Draw all the basic blocks in LOOP.  Print the blocks in breath-first
order to get a good ranking of the nodes.  This function is recursive:
It first prints inner loops, then the body of LOOP itself.  */
@@ -205,17 +209,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int 
funcdef_no,

   if (loop->header != NULL
   && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun))
-pp_printf (pp,
-  "\tsubgraph cluster_%d_%d {\n"
-  "\tstyle=\"filled\";\n"
-  "\tcolor=\"darkgreen\";\n"
-  "\tfillcolor=\"%s\";\n"
-  "\tlabel=\"loop %d\";\n"
-  "\tlabeljust=l;\n"
-  "\tpenwidth=2;\n",
-  funcdef_no, loop->num,
-  fillcolors[(loop_depth (loop) - 1) % 3],
-  loop->num);
+{
+  pp_printf (pp,
+ "\tsubgraph cluster_%d_%d {\n"
+ "\tstyle=\"filled\";\n"
+ "\tcolor=\"darkgreen\";\n"
+ "\tfillcolor=\"%s\";\n"
+ "\tlabel=\"loop %d %s\";\n"
+ "\tlabeljust=l;\n"
+ "\tpenwidth=2;\n",
+ funcdef_no, loop->num,
+ fillcolors[(loop_depth (loop) - 1) % 3], loop->num,
+ /* This is only meaningful for loops that have been processed
+by Graphite.
+
+TODO Use can_be_parallel_valid_p? */
+ !oacc_get_fn_attrib (cfun->decl)
+ ? ""
+ : loop->can_be_parallel ? "(can_be_parallel = true)"
+ : "(can_be_parallel = false)");
+}

   for (class loop *inner = loop->inner; inner; inner = inner->next)
 draw_cfg_nodes_for_loop (pp, funcdef_no, inner);
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 12/22] openacc: Remove unused partitioning in "kernels" regions

2021-11-17 Thread Frederik Harwath
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.

This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.

gcc/ChangeLog:

* omp-offload.c (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
---
 gcc/omp-offload.c | 51 +--
 .../acc_prof-kernels-1.c  | 19 ---
 2 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index f5cb222efd8c..68cc5a9d9e5d 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1215,6 +1215,39 @@ oacc_parse_default_dims (const char *dims)
   targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0);
 }

+/* Remove parallelism dimensions below LEVEL which are not set in USED
+   from DIMS and emit a warning pointing to the location of FN. */
+
+static void
+oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used)
+{
+
+  bool host_compiler = true;
+#ifdef ACCEL_COMPILER
+  host_compiler = false;
+#endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
+  char removed_partitions[20] = "\0";
+  for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
+if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0)
+  {
+if (host_compiler)
+  {
+strcat (removed_partitions, axes[ix]);
+strcat (removed_partitions, " ");
+  }
+dims[ix] = -1;
+  }
+  if (removed_partitions[0] != '\0')
+warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
+"removed %spartitioning from % region",
+removed_partitions);
+}
+
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
raw attribute.  DIMS is an array of dimensions, which is filled in.
LEVEL is the partitioning level of a routine, or -1 for an offload
@@ -1235,6 +1268,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
   for (ix = 0; ix != GOMP_DIM_MAX; ix++)
 {
   purpose[ix] = TREE_PURPOSE (pos);
+
   tree val = TREE_VALUE (pos);
   dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
   pos = TREE_CHAIN (pos);
@@ -1244,14 +1278,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
 #ifdef ACCEL_COMPILER
   check = false;
 #endif
+
+  static char const *const axes[] =
+  /* Must be kept in sync with GOMP_DIM enumeration.  */
+  { "gang", "worker", "vector" };
+
   if (check
   && warn_openacc_parallelism
-  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))
-  && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES 
(fn)))
+  && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)))
 {
-  static char const *const axes[] =
-  /* Must be kept in sync with GOMP_DIM enumeration.  */
-   { "gang", "worker", "vector" };
   for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++)
if (dims[ix] < 0)
  ; /* Defaulting axis.  */
@@ -1262,14 +1297,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int 
level, unsigned used)
  "region contains %s partitioned code but"
  " is not %s partitioned", axes[ix], axes[ix]);
else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1)
+ {
  /* The dimension is explicitly partitioned to non-unity, but
 no use is made within the region.  */
  warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism,
  "region is %s partitioned but"
  " does not contain %s partitioned code",
  axes[ix], axes[ix]);
+  }
 }

+  if (lookup_attribute ("oacc parallel_kernels_graphite",
+ DECL_ATTRIBUTES (fn)))
+oacc_remove_unused_partitioning  (fn, dims, level, used);
+
   bool changed = targetm.goacc.validate_dims (fn, dims, level, used);

   /* Default anything left to 1 or a partitioned default.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 4a9b11a3d3fe..d398b3463617 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -7,6 +7,8 @@

 #include 

+/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" 
} { "" } } */

[OG11][committed][PATCH 11/22] openacc: Add further kernels tests

2021-11-17 Thread Frederik Harwath
Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.
---
 ...classify-kernels-unparallelized-graphite.c |  41 +
 ...classify-kernels-unparallelized-parloops.c |  47 ++
 .../goacc/kernels-decompose-1-parloops.c  | 125 ++
 .../goacc/kernels-reduction-parloops.c|  36 
 .../c-c++-common/goacc/loop-auto-reductions.c |  22 +++
 ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++
 .../goacc/note-parallelism-kernels-loops-1.c  |  61 +++
 .../note-parallelism-kernels-loops-parloops.c |  53 ++
 ...assify-kernels-unparallelized-parloops.f95 |  44 +
 .../gfortran.dg/goacc/kernels-conversion.f95  |  52 ++
 .../goacc/kernels-decompose-1-parloops.f95| 121 ++
 .../goacc/kernels-decompose-parloops-2.f95| 154 ++
 .../goacc/kernels-loop-data-parloops-2.f95|  52 ++
 .../goacc/kernels-loop-parloops-2.f95 |  45 +
 .../goacc/kernels-loop-parloops.f95   |  39 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 .../parallel-loop-auto-reduction-2.f90|  98 +++
 17 files changed, 1155 insertions(+)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90

diff --git 
a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
new file mode 100644
index ..77f4524907a9
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c
@@ -0,0 +1,41 @@
+/* Check offloaded function's attributes and classification for unparallelized
+   OpenACC 'kernels' with Graphite kernles handling (default).  */
+
+/* { dg-additional-options "-O2" }
+   { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
+   { dg-additional-options "-fopt-info-optimized-omp" }
+   { dg-additional-options "-fopt-info-note-omp" }
+   { dg-additional-options "-fdump-tree-ompexp" }
+   { dg-additional-options "-fdump-tree-graphite-details" }
+   { dg-additional-options "-fdump-tree-oaccloops1" }
+  

[OG11][committed][PATCH 13/22] Add function for printing a single OMP_CLAUSE

2021-11-17 Thread Frederik Harwath
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs.  The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.

gcc/ChangeLog:

* tree-pretty-print.c (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
---
 gcc/tree-pretty-print.c | 11 +++
 gcc/tree-pretty-print.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index d769cd8f07c5..2e0255176c76 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1402,6 +1402,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int 
spc, dump_flags_t flags)
 }
 }

+/* Print the single clause at the top of the clause chain C to a string and
+   return it. Note that print_generic_expr_to_str prints the whole clause chain
+   instead. The caller must free the returned memory. */
+
+char *
+print_omp_clause_to_str (tree c)
+{
+  pretty_printer pp;
+  dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS);
+  return xstrdup (pp_formatted_text (&pp));
+}

 /* Dump chain of OMP clauses.

diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h
index cafe9aa95989..3368cb9f1544 100644
--- a/gcc/tree-pretty-print.h
+++ b/gcc/tree-pretty-print.h
@@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = 
TDF_NONE);
 extern char *print_generic_expr_to_str (tree);
 extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t,
  bool = true);
+extern char *print_omp_clause_to_str (tree);
 extern void dump_omp_atomic_memory_order (pretty_printer *,
  enum omp_memory_order);
 extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int,
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 15/22] openacc: Add runtime alias checking for OpenACC kernels

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite.  The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls.  If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.

gcc/ChangeLog:

* graphite-isl-ast-to-gimple.c: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.c (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
---
 gcc/graphite-isl-ast-to-gimple.c  | 122 ++
 gcc/graphite-scop-detection.c |  18 +-
 gcc/omp-expand.c  |  37 +-
 gcc/omp-offload.c | 413 ++
 .../runtime-alias-check-1.c   |  79 
 .../runtime-alias-check-2.c   |  90 
 6 files changed, 550 insertions(+), 209 deletions(-)
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c
 create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index c516170d9493..bdabe588c3d8 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "graphite.h"
 #include "graphite-oacc.h"
 #include "stdlib.h"
+#include "internal-fn.h"

 struct ast_build_info
 {
@@ -1698,6 +1699,127 @@ graphite_oacc_analyze_scop (scop_p scop)
   print_isl_schedule (dump_file, scop->original_schedule);
 }

+  if (flag_graphite_runtime_alias_checks
+  && scop->unhandled_alias_ddrs.length () > 0)
+{
+  sese_info_p region = scop->scop_info;
+
+  /* Usually there will be a chunking loop with the actual work loop
+inside it.  In some corner cases there may only be one loop.  */
+  loop_p top_loop = region->region.entry->dest->loop_father;
+  loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop;
+  tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, 
active_loop);
+
+  /* Walk back to GOACC_LOOP block.  */
+  basic_block goacc_loop_block = region->region.entry->src;
+
+  /* Find the GOACC_LOOP calls. If there aren't any then this is not an
+OpenACC kernels loop and will need different handling.  */
+  gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block);
+  while (!gsi_end_p (gsitop)
+&& (!is_gimple_call (gsi_stmt (gsitop))
+|| !gimple_call_internal_p (gsi_stmt (gsitop))
+|| (gimple_call_internal_fn (gsi_stmt (gsitop))
+!= IFN_GOACC_LOOP)))
+   gsi_next (&gsitop);
+
+  if (!gsi_end_p (gsitop))
+   {
+ /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted
+statements.  There ought not be any problematic dependencies 
because
+the chunk size and step are only computed for very specific 
purposes.
+They may not be at the very top of the block, but they should be
+found together (the asserts test this assuption). */
+ gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block);
+ gsi_move_after (&gsitop, &gsibottom);
+ gimple_stmt_iterator gsiinsert = gsibottom;
+ gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop))
+  && gimple_call_internal_p (gsi_stmt (gsitop))
+  && (gimple_call_internal_fn (gsi_stmt (gsitop))
+  == IFN_GOACC_LOOP));
+ gsi_move_after (&gsitop, &gsibottom);
+
+ /* Insert "noalias_p = COND" before the GOACC_LOOP statements.
+Note that these likely depend on some of the hoisted statements.  
*/
+ tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, 
NULL,
+   true, GSI_NEW_STMT);
+
+ /* Insert the cond_val into each GOACC_LOOP call in the region.  */
+ for (int n = -1; n < (int)region->bbs.length (); n++)
+   {
+ /* Cover the region plus goacc_loop_block.  */
+ basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n];
+
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+  !gsi_end_p (gsi);
+  gsi_next (&gsi))
+   {
+ gimpl

[OG11][committed][PATCH 14/22] openacc: Add data optimization pass

2021-11-17 Thread Frederik Harwath
From: Andrew Stubbs 

Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.

This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private".  The pass is restricted
"kernels" regions but could be extended to other types of regions.

gcc/ChangeLog:

* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.

Co-Authored-By: Thomas Schwinge 
---
 gcc/Makefile.in   |   1 +
 gcc/doc/gimple.texi   |   2 +
 gcc/gimple-walk.c |  15 +-
 gcc/gimple-walk.h |   6 +
 gcc/omp-data-optimize.cc  | 951 ++
 gcc/passes.def|   1 +
 .../goacc/note-parallelism-1-kernels-loops.c  |   7 +-
 ...note-parallelism-1-kernels-straight-line.c |   9 +-
 .../goacc/note-parallelism-kernels-loops.c|  10 +-
 .../c-c++-common/goacc/omp_data_optimize-1.c  | 677 +
 .../c-c++-common/goacc/uninit-copy-clause.c   |   6 +
 .../g++.dg/goacc/omp_data_optimize-1.C| 169 
 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++
 .../gfortran.dg/goacc/uninit-copy-clause.f95  |   2 +
 gcc/tree-pass.h   |   1 +
 .../kernels-decompose-1.c |   2 +
 .../libgomp.oacc-fortran/pr94358-1.f90|   4 +
 17 files changed, 2444 insertions(+), 7 deletions(-)
 create mode 100644 gcc/omp-data-optimize.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4ebdcdbc5f8c..8c02b85d2a96 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1507,6 +1507,7 @@ OBJS = \
omp-low.o \
omp-oacc-kernels-decompose.o \
omp-simd-clone.o \
+   omp-data-optimize.o \
opt-problem.o \
optabs.o \
optabs-libfuncs.o \
diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 4b3d7d7452e3..a83e17f71a40 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one.  @code{WI} 
is as in
 @code{walk_gimple_stmt}.  If @code{walk_gimple_stmt} returns non-@code{NULL}, 
the walk
 is stopped and the value returned.  Otherwise, all the statements
 are walked and @code{NULL_TREE} returned.
+
+TODO update for forward vs. backward.
 @end deftypefn
diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index cd287860994e..66fd491844d7 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -32,6 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt
on each one.  WI is as in walk_gimple_stmt.

+   TODO update for forward vs. backward.
+
If walk_gimple_stmt returns non-NULL, the walk is stopped, and the
value is stored in WI->CALLBACK_RESULT.  Also, the statement that
produced the value is returned if this statement has not been
@@ -44,9 +46,10 @@ gimple *
 walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt,
 walk_tree_fn callback_op, struct walk_stmt_info *wi)
 {
-  gimple_stmt_iterator gsi;
+  bool forward = !(wi && wi->backward);

-  for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); )
+  gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq);
+  for (; !gsi_end_p (gsi); )
 {
   tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi);
   if (ret)
@@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn 
callback_stmt,
}

   if (!wi->removed_stmt)
-   gsi_next (&gsi);
+   {
+ if (forward)
+   gsi_next (&gs

[OG11][committed][PATCH 16/22] openacc: Warn about "independent" "kernels" loops with data-dependences

2021-11-17 Thread Frederik Harwath
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences.  A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.

gcc/ChangeLog:

* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.c (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
---
 gcc/common.opt|  5 +
 gcc/omp-offload.c | 49 +++
 2 files changed, 54 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index aa695e56dc48..4c38ed5cf9ab 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -838,6 +838,11 @@ Wtsan
 Common Var(warn_tsan) Init(1) Warning
 Warn about unsupported features in ThreadSanitizer.

+Wopenacc-false-independent
+Common Var(warn_openacc_false_independent) Init(1) Warning
+Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\"
+clause but analysis shows that it has loop-carried dependences.
+
 Xassembler
 Driver Separate

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 94a975a88660..b806e36ef515 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -2043,6 +2043,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop 
*loop)
   return true;
 }

+/* Emit a warning if LOOP has an "independent" clause but Graphite's
+   analysis shows that it has data dependences. Note that we respect
+   the user's explicit decision to parallelize the loop but we
+   nevertheless warn that this decision could be wrong. */
+
+static void
+oacc_loop_warn_if_false_independent (oacc_loop *loop)
+{
+  if (!optimize)
+return;
+
+  if (loop->routine)
+return;
+
+  /* TODO Warn about "auto" & "independent" in "parallel" regions? */
+  if (!oacc_parallel_kernels_graphite_fun_p ())
+return;
+
+  if (!(loop->flags & OLF_INDEPENDENT))
+return;
+
+  bool analyzed = false;
+  bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed);
+  loop_p cfg_loop = oacc_loop_get_cfg_loop (loop);
+
+  if (cfg_loop && cfg_loop->inner && !analyzed)
+{
+  if (dump_enabled_p ())
+   {
+ const dump_user_location_t loc
+   = dump_user_location_t::from_location_t (loop->loc);
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "'independent' loop in 'kernels' region has not been 
"
+  "analyzed (cf. 'graphite' "
+  "dumps for more information).\n");
+   }
+  return;
+}
+
+  if (!can_be_parallel)
+warning_at (loop->loc, 0,
+"loop has \"independent\" clause but data dependences were "
+"found.");
+}
+
 /* Walk the OpenACC loop hierarchy checking and assigning the
programmer-specified partitionings.  OUTER_MASK is the partitioning
this loop is contained within.  Return mask of partitioning
@@ -2094,6 +2139,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)
}
}

+  /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */
+  if (warn_openacc_false_independent)
+oacc_loop_warn_if_false_independent (loop);
+
   if (maybe_auto && (loop->flags & OLF_INDEPENDENT))
{
  loop->flags |= OLF_AUTO;
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 17/22] openacc: Handle internal function calls in pass_lim

2021-11-17 Thread Frederik Harwath
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable.  The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.

This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop.  This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process.  Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.

A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.

gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.c (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust 
declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call 
to
loop_invariant_motion_in_fun.
---
 gcc/gimple-loop-interchange.cc |  2 +-
 gcc/passes.def |  2 +-
 gcc/tree-ssa-loop-im.c | 58 --
 gcc/tree-ssa-loop-manip.h  |  2 +-
 4 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc
index 7b799eca805c..d617438910fd 100644
--- a/gcc/gimple-loop-interchange.cc
+++ b/gcc/gimple-loop-interchange.cc
@@ -2096,7 +2096,7 @@ pass_linterchange::execute (function *fun)
   if (changed_p)
 {
   unsigned todo = TODO_update_ssa_only_virtuals;
-  todo |= loop_invariant_motion_in_fun (cfun, false);
+  todo |= loop_invariant_motion_in_fun (cfun, false, false);
   scev_reset ();
   return todo;
 }
diff --git a/gcc/passes.def b/gcc/passes.def
index 48c9821011f0..d1dedbc287e2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,7 +247,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cse_sincos);
   NEXT_PASS (pass_optimize_bswap);
   NEXT_PASS (pass_laddress);
-  NEXT_PASS (pass_lim);
+  NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */);
   NEXT_PASS (pass_walloca, false);
   NEXT_PASS (pass_pre);
   NEXT_PASS (pass_sink_code);
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 7de47edbcb30..b392ae609aaf 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -47,6 +47,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-dfa.h"
 #include "dbgcnt.h"
+#include "graphite-oacc.h"
+#include "internal-fn.h"

 /* TODO:  Support for predicated code motion.  I.e.

@@ -320,11 +322,23 @@ enum move_pos
Otherwise return MOVE_IMPOSSIBLE.  */

 enum move_pos
-movement_possibility (gimple *stmt)
+movement_possibility (gimple *stmt, bool restrict_oacc_hoisting)
 {
   tree lhs;
   enum move_pos ret = MOVE_POSSIBLE;

+  if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl)
+  && gimple_code (stmt) == GIMPLE_ASSIGN)
+{
+  tree rhs = gimple_assign_rhs1 (stmt);
+
+  if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR)
+   rhs = TREE_OPERAND (rhs, 0);
+
+  if (TREE_CODE (rhs) == ARRAY_REF)
+ return MOVE_IMPOSSIBLE;
+}
+
   if (flag_unswitch_loops
   && gimple_code (stmt) == GIMPLE_COND)
 {
@@ -974,7 +988,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi)
statements.  */

 static void
-compute_invariantness (basic_block bb)
+compute_invariantness (basic_block bb, bool restrict_oacc_hoisting)
 {
   enum move_pos pos;
   gimple_stmt_iterator bsi;
@@ -1002,7 +1016,7 @@ compute_invariantness (basic_block bb)
   {
stmt = gsi_stmt (bsi);

-   pos = movement_possibility (stmt);
+   pos = movement_possibility (stmt, re

[OG11][committed][PATCH 18/22] openacc: Disable pass_pre on outlined functions analyzed by Graphite

2021-11-17 Thread Frederik Harwath
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.

This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.

gcc/ChangeLog:

* tree-ssa-pre.c (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
---
 gcc/tree-ssa-pre.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c
index 2aedc31e1d73..b904354e4c78 100644
--- a/gcc/tree-ssa-pre.c
+++ b/gcc/tree-ssa-pre.c
@@ -51,6 +51,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dce.h"
 #include "tree-cfgcleanup.h"
 #include "alias.h"
+#include "graphite-oacc.h"

 /* Even though this file is called tree-ssa-pre.c, we actually
implement a bit more than just PRE here.  All of them piggy-back
@@ -3736,6 +3737,22 @@ do_hoist_insertion (basic_block block)
 static void
 insert (void)
 {
+
+/* The additional dependences introduced by the code insertions
+ can cause Graphite's dependence analysis to fail .  Without
+ special handling of those dependences in Graphite, it seems
+ better to skip this step if OpenACC loops that need to be handled
+ by Graphite are found.  Note that the full redundancy elimination
+ step of this pass is useful for the purpose of dependence
+ analysis, for instance, because it can remove definitions from
+ SCoPs that would otherwise prevent the creation of runtime alias
+ checks since those may only use definitions that are available
+ before the SCoP. */
+
+  if (oacc_function_p (cfun)
+  && ::graphite_analyze_oacc_function_p (cfun))
+return;
+
   basic_block bb;

   FOR_ALL_BB_FN (bb, cfun)
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 19/22] graphite: Tune parameters for OpenACC use

2021-11-17 Thread Frederik Harwath
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes.  Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.

This commit increases the parameter values on OpenACC functions.  The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite.  Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.

gcc/ChangeLog:

* graphite-optimize-isl.c (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.

* graphite-scop-detection.c (build_scops): Likewise for
param_graphite_max_arrays_per_scop.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
---
 gcc/graphite-optimize-isl.c   | 35 ---
 gcc/graphite-scop-detection.c | 28 ++-
 .../gcc.dg/goacc/graphite-parameter-1.c   | 21 +++
 .../gcc.dg/goacc/graphite-parameter-2.c   | 23 
 4 files changed, 101 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 019452700a49..4eecbd20b740 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "dumpfile.h"
 #include "tree-vectorizer.h"
 #include "graphite.h"
+#include "graphite-oacc.h"


 /* get_schedule_for_node_st - Improve schedule for the schedule node.
@@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
   int old_err = isl_options_get_on_error (scop->isl_context);
   int old_max_operations = isl_ctx_get_max_operations (scop->isl_context);
   int max_operations = param_max_isl_operations;
+
+  /* The default value for param_max_isl_operations is easily exceeded
+ by "kernels" loops in existing OpenACC codes.  Raise the values
+ significantly since analyzing those loops is crucial. */
+  if (param_max_isl_operations == 35 /* default value */
+  && oacc_function_p (cfun))
+max_operations = 200;
+
   if (max_operations)
 isl_ctx_set_max_operations (scop->isl_context, max_operations);
   isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE);
@@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite)
  dump_user_location_t loc = find_loop_location
(scop->scop_info->region.entry->dest->loop_father);
  if (isl_ctx_last_error (scop->isl_context) == isl_error_quota)
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
-"loop nest not optimized, optimization timed out "
-"after %d operations [--param 
max-isl-operations]\n",
-max_operations);
- else
+   {
+  if (oacc_function_p (cfun))
+   {
+ /* Special casing for OpenACC to unify diagnostic messages
+here and in graphite-scop-detection.c. */
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+   "data-dependence analysis of OpenACC loop "
+   "nest "
+   "failed; try increasing the value of "
+   "--param="
+   "max-isl-operations=%d.\n",
+   max_operations);
+}
+  else
+dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+ "loop nest not optimized, optimization timed "
+ "out after %d operations [--param "
+ "max-isl-operations]\n",
+ max_operations);
+}
+  else
dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
 "loop nest not optimized, ISL signalled an 
error\n");
}
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 8b41044bce5e..afc955cc97eb 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -2056,6 +2056,9 @@ determine_openacc_reductions (scop_p scop)
   }
 }

+
+extern dump_user_location_t find_loop_location (class loop *);
+
 /* Find Static Control Parts (SCoP) in the current function and pushes
them to SCOPS.  */

@@ -2109,6 +2112,11 @@ build_scops (vec *scops)

[OG11][committed][PATCH 20/22] graphite: Adjust scop loop-nest choice

2021-11-17 Thread Frederik Harwath
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP.  The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region.  The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop.  This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops.  This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
---
 gcc/graphite-isl-ast-to-gimple.c |  4 +---
 gcc/graphite-scop-detection.c| 21 ++---
 gcc/graphite.h   |  1 +
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index bdabe588c3d8..ec055a358f39 100644
--- a/gcc/graphite-isl-ast-to-gimple.c
+++ b/gcc/graphite-isl-ast-to-gimple.c
@@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop)
 conditional if aliasing can be ruled out at runtime and the original
 version of the SCoP, otherwise. */

-  loop_p loop
-  = find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-  scop->scop_info->region.exit->src->loop_father);
+  loop_p loop = scop_context_loop (scop);
   tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop);
   tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
   set_ifsese_condition (region->if_region, non_alias_cond);
diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index afc955cc97eb..99e906a5d120 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb)
   return NULL;
 }

+
+/* Return the innermost loop that encloses all loops in SCOP. */
+
+loop_p
+scop_context_loop (scop_p scop)
+{
+  edge scop_entry = scop->scop_info->region.entry;
+  edge scop_exit = scop->scop_info->region.exit;
+  basic_block exit_bb = scop_exit->src;
+
+  while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb))
+exit_bb = single_pred (exit_bb);
+
+  loop_p entry_loop = scop_entry->dest->loop_father;
+  return find_common_loop (entry_loop, exit_bb->loop_father);
+}
+
 namespace
 {

@@ -1776,9 +1793,7 @@ build_alias_set (scop_p scop)
   int i, j;
   int *all_vertices;

-  struct loop *nest
-= find_common_loop (scop->scop_info->region.entry->dest->loop_father,
-   scop->scop_info->region.exit->src->loop_father);
+  struct loop *nest = scop_context_loop (scop);

   gcc_checking_assert (nest);

diff --git a/gcc/graphite.h b/gcc/graphite.h
index 9c508f31109f..dacb27a9073c 100644
--- a/gcc/graphite.h
+++ b/gcc/graphite.h
@@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l 
&, loop_p, tree);
 extern void dot_all_sese (FILE *, vec &);
 extern void dot_sese (sese_l &);
 extern void dot_cfg ();
+extern loop_p scop_context_loop (scop_p);
 #endif
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[OG11][committed][PATCH 21/22] graphite: Accept loops without data references

2021-11-17 Thread Frederik Harwath
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops.  Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions.  If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.

gcc/ChangeLog:

* graphite-scop-detection.c (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
---
 gcc/graphite-scop-detection.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index 99e906a5d120..9311a0e42a57 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -851,19 +851,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const
  return true;
}

-  /* Check if all loop nests have at least one data reference.
-???  This check is expensive and loops premature at this point.
-If important to retain we can pre-compute this for all innermost
-loops and reject those when we build a SESE region for a loop
-during SESE discovery.  */
-  if (! loop->inner
- && ! loop_nest_has_data_refs (loop))
-   {
- DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num
-  << " does not have any data reference.\n");
- return true;
-   }
-
   DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is 
harmless.\n");
 }

--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH 15/40] graphite: Extend SCoP detection dump output

2022-05-18 Thread Harwath, Frederik
Hi Richard,

On Tue, 2022-05-17 at 08:21 +, Richard Biener wrote:
> On Mon, 16 May 2022, Tobias Burnus wrote:
>
> > As requested by Richard: Rediffed patch.
> >
> > Changes: s/.c/.cc/ + some whitespace changes.
> > (At least in my email reader, some  were lost. I also fixed
> > too-long line
> > issues.)
> >
> > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...'
> > (macro was removed late in GCC 12 development ? r12-2605-
> > ge41ba804ba5f5c)
> >
> > Otherwise, it should be identical to Frederik's patch, earlier in
> > this thread.
> >
> > On 15.12.21 16:54, Frederik Harwath wrote:
> > > Extend dump output to make understanding why Graphite rejects to
> > > include a loop in a SCoP easier (for GCC developers).
> >
> > OK for mainline?
>
> +  if (printed)
> +fprintf (file, "\b\b");
>
> please find other means of omitting ", ", like by printing it
> _before_ the number but only for the second and following loop
> number.

Done.

>
> I'll also note that
>
> +static void
> +print_sese_loop_numbers (FILE *file, sese_l sese)
> +{
> +  bool printed = false;
> +  for (auto loop : loops_list (cfun, 0))
> +{
> +  if (loop_in_sese_p (loop, sese))
> +   fprintf (file, "%d, ", loop->num);
> +  printed = true;
> +}
>
> is hardly optimal.  Please instead iterate over
> sese.entry->dest->loop_father and children instead which you can do
> by passing that as extra argument to loops_list.

Done.

This had to be extended a little bit, because a SCoP
can consist of consecutive loop-nests and iterating
only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest-
>loop_father))" would output only the loops from the first
loop-nest in the SCoP (cf. the test file scop-22a.c that I added).

>
> +
> +  if (dump_file && dump_flags & TDF_DETAILS)
> +{
> +  fprintf (dump_file, "Loops in SCoP: ");
> +  for (auto loop : loops_list (cfun, 0))
> +   if (loop_in_sese_p (loop, s))
> + fprintf (dump_file, "%d ", loop->num);
> +  fprintf (dump_file, "\n");
> +}
>
> you are duplicating functionality of the function you just added ...
>

Fixed.

> Otherwise looks OK to me.

Can I commit the revised patch?

Thanks for your review,
Frederik

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
From fb268a37704b1598a84051c735514ff38adad038 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 18 May 2022 07:59:42 +0200
Subject: [PATCH] graphite: Extend SCoP detection dump output

Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).

gcc/ChangeLog:

	* graphite-scop-detection.cc (scop_detection::can_represent_loop):
	Output reason for failure to dump file.
	(scop_detection::harmful_loop_in_region): Likewise.
	(scop_detection::graphite_can_represent_expr): Likewise.
	(scop_detection::stmt_has_simple_data_refs_p): Likewise.
	(scop_detection::stmt_simple_for_scop_p): Likewise.
	(print_sese_loop_numbers): New function.
	(scop_detection::add_scop): Use from here.

gcc/testsuite/ChangeLog:

	* gcc.dg/graphite/scop-22a.c: New test.
---
 gcc/graphite-scop-detection.cc   | 184 ---
 gcc/testsuite/gcc.dg/graphite/scop-22a.c |  56 +++
 2 files changed, 219 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-22a.c

diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc
index 8c0ee9975579..9792d87ee0ae 100644
--- a/gcc/graphite-scop-detection.cc
+++ b/gcc/graphite-scop-detection.cc
@@ -69,12 +69,27 @@ public:
 fprintf (output.dump_file, "%d", i);
 return output;
   }
+
   friend debug_printer &
   operator<< (debug_printer &output, const char *s)
   {
 fprintf (output.dump_file, "%s", s);
 return output;
   }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, gimple* stmt)
+  {
+print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS);
+return output;
+  }
+
+  friend debug_printer &
+  operator<< (debug_printer &output, tree t)
+  {
+print_generic_expr (output.dump_file, t, TDF_SLIM);
+return output;
+  }
 } dp;
 
 #define DEBUG_PRINT(args) do \
@@ -506,6 +521,27 @@ scop_detection::merge_sese (sese_l first, sese_l second) const
   return combined;
 }
 
+/* Print the loop numbers of the loops contained in SESE to FILE. */
+
+static void
+p

Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses

2019-10-29 Thread Harwath, Frederik

On 24.10.19 16:31, Thomas Schwinge wrote:

Hi,
I have attached a revised patch.


[...] I was wondering if the way in which the patch
avoids issuing errors about operator switches more than once by modifying the 
clauses (cf. the
corresponding comment in omp-low.c) could lead to problems [...]
 
"Patching up" erroneous state or even completely removing OMP clauses is

-- as far as I understand -- acceptable to avoid "issuing errors about
operator switches more than once".  This doesn't affect code generation,
because no code will be generated at all.

(Does that answer your question?)



Yes, thank you.



Regarding my suggestions to "demote error to warning diagnostics", I'd
suggest that at this point we do *not* try to fix for the user any
presumed wrong/missing 'reduction' clauses (difficult/impossible to do
correctly in the general case), but really only diagnose them.


Ok, I have changed the errors into warnings and I have removed the
code for avoiding repeated messages.

So just C/C++ testing, no Fortran at all.  This is not ideal, but
probably (hopefully) acceptable given that this is working on the middle
end representation shared between all front ends.


Thanks to Tobias, we now also have Fortran tests.


To match the order in 'struct omp_context' (see above), move these new
initializations before those of 'ctx->depth'.  (Even if that also just
achieves "some local consistency".)  ;-)


Done.


@@ -1131,6 +1141,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
  
  	case OMP_CLAUSE_REDUCTION:

case OMP_CLAUSE_IN_REDUCTION:
+ if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
+   ctx->local_reduction_clauses
+ = tree_cons (NULL, c, ctx->local_reduction_clauses);
  decl = OMP_CLAUSE_DECL (c);
  if (TREE_CODE (decl) == MEM_REF)
{


I think this should really only apply to 'OMP_CLAUSE_REDUCTION' but not > 
'OMP_CLAUSE_IN_REDUCTION' (please verify)?


Right, I have moved the new code to the OMP_CLAUSE_REDUCTION case above.



I'm usually the last one to complain about such things ;-) -- but here
really the indentation of the new code seems to be off?  Please verify.
Maybe you had set a tab-stop to four spaces instead of eight?


Oh, it should look better now.


--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c


Rename to '*-warn.c', and instead of 'dg-error' use 'dg-warning'
(possibly more than currently).


Ok.


--- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c
@@ -16,17 +16,6 @@ int foo (int N)
}
}
  
-  #pragma acc parallel

-  {
-#pragma acc loop reduction(+:b)
-for (int i = 0; i < N; i++)
-  {
-#pragma acc loop
-   for (int j = 0; j < N; j++)
- b += 1;
-  }
-  }
-
#pragma acc parallel
{
  #pragma acc loop reduction(+:c)


That one stays in, but gets a 'dg-warning'.


What warning would you expect to see here? I do not get any warnings.

Best regards,
Frederik

>From 22f45d4c2c11febce171272f9289c487aed4f9d7 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 29 Oct 2019 12:39:23 +0100
Subject: [PATCH] Warn about inconsistent OpenACC nested reduction clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
warnings if it is violated.

In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
reductions on variable b; adding a reduction clause to make it compile cleanly
would make it a duplicate of the test for variable c.

2019-10-29  Gergö Barany  
		Tobias Burnus  
		Frederik Harwath  
		Thomas Schwinge  

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-warn.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 * gfortran.dg/goacc/nested-reductions-warn.f90: New test.
	 * gfortran.dg/goacc/nested-reductions.f90: New test.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing red

Re: Add OpenACC 2.6 `acc_get_property' support

2019-11-05 Thread Harwath, Frederik
onsider (PGI?). The standard does 
not impose
any restrictions on the format of the string.


> > +default:
> > +  break;
>
> Should this 'GOMP_PLUGIN_error' or even 'GOMP_PLUGIN_fatal'?  (Similar
> then elsewhere.)

Yes, I chose GOMP_PLUGIN_error.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc-get-property.c
> > @@ -0,0 +1,37 @@
> > +/* Test the `acc_get_property' and '`acc_get_property_string' library
> > +   functions. */
> > +/* { dg-do run } */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +int main ()
> > +{
> > +  const char *s;
> > +  size_t v;
> > +  int r;
> > +
> > +  /* Verify that the vendor is a proper non-empty string.  */
> > +  s = acc_get_property_string (0, acc_device_default, acc_property_vendor);
> > +  r = !s || !strlen (s);
> > +  if (s)
> > +printf ("OpenACC vendor: %s\n", s);
>
> Should we check the actual string returned, as defined by OpenACC/our
> implementation, as applicable?  Use '#if defined ACC_DEVICE_TYPE_[...]'.
> (See 'libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c',
> for example.)

Yes.

> Isn't this the "Device vendor" instead of the "OpenACC vendor"?  Similar
> for all other 'printf's?

Yes.


> These tests only use 'acc_device_default', should they also check other
> valid as well as invalid values?

That would be better.

Frederik



Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses

2019-11-06 Thread Harwath, Frederik
Hi Thomas,

On 05.11.19 15:22, Thomas Schwinge wrote:

> For your convenience, I'm attaching an incremental patch, to be merged
> into yours.> [...]> With that addressed, OK for trunk.

Thank you. I have merged the patches and committed.

> A few more comments to address separately, later on.

I will look into your remaining questions.

Best regards,
Frederik



[PATCH] Add OpenACC 2.6 `serial' construct support

2019-11-07 Thread Frederik Harwath
Hi,
this patch implements the OpenACC 2.6 "serial" construct.
It has been tested by running the testsuite with nvptx-none
offloading on x86_64-pc-linux-gnu.

Best regards,
Frederik
 
 8< ---

The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard)
is equivalent to a `parallel' construct with clauses `num_gangs(1)
 num_workers(1) vector_length(1)' implied.
These clauses are therefore not supported with the `serial'
construct. All the remaining clauses accepted with `parallel' are also
accepted with `serial'.

The `serial' construct is implemented like `parallel', except for
hardcoding dimensions rather than taking them from the relevant
clauses, in `expand_omp_target'.

Separate codes are used to denote the `serial' construct throughout the
middle end, even though the mapping of `serial' to an equivalent
`parallel' construct could have been done in the individual language
frontends. In particular, this allows to distinguish between `parallel'
and `serial' in warnings, error messages, dumps etc.

    2019-11-07  Maciej W. Rozycki  
Tobias Burnus  
Frederik Harwath  

gcc/
* gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL
enumeration constant.
(is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(is_gimple_omp_offloaded): Likewise.
* gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration
constant.  Adjust the value of ORT_NONE accordingly.
(is_gimple_stmt): Handle OACC_SERIAL.
(oacc_default_clause): Handle ORT_ACC_SERIAL.
(gomp_needs_data_present): Likewise.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_workshare): Handle OACC_SERIAL.
(gimplify_expr): Likewise.
* omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter.
* omp-expand.c (expand_omp_target):
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(build_omp_regions_1, omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel): Rename function to...
(is_oacc_parallel_or_serial): ... this.
Handle GF_OMP_TARGET_KIND_OACC_SERIAL.
(scan_sharing_clauses): Adjust accordingly.
(scan_omp_for): Likewise.
(lower_oacc_head_mark): Likewise.
(convert_from_firstprivate_int): Likewise.
(lower_omp_target): Likewise.
(check_omp_nesting_restrictions): Handle
GF_OMP_TARGET_KIND_OACC_SERIAL.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* tree.def (OACC_SERIAL): New tree code.
* tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL.

* doc/generic.texi (OpenACC): Document OACC_SERIAL.

gcc/c-family/
* c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration
constant.
* c-pragma.c (oacc_pragmas): Add "serial" entry.

gcc/c/
* c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(c_parser_oacc_kernels_parallel): Rename function to...
(c_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(c_parser_omp_construct): Update accordingly.

gcc/cp/
* constexpr.c (potential_constant_expression_1): Handle
OACC_SERIAL.
* parser.c (OACC_SERIAL_CLAUSE_MASK): New macro.
(cp_parser_oacc_kernels_parallel): Rename function to...
(cp_parser_oacc_compute): ... this.  Handle PRAGMA_OACC_SERIAL.
(cp_parser_omp_construct): Update accordingly.
(cp_parser_pragma): Handle PRAGMA_OACC_SERIAL.  Fix alphabetic
order.
* pt.c (tsubst_expr): Handle OACC_SERIAL.

gcc/fortran/
* gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP,
ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL
enumeration constants.
(gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL
enumeration constants.
* match.h (gfc_match_oacc_serial): New prototype.
(gfc_match_oacc_serial_loop): Likewise.
* dump-parse-tree.c (show_omp_node, show_code_node): Handle
EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL.
* match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP.
* openmp.c (OACC_SERIAL_CLAUSES): New macro.
(gfc_match_oacc_serial_loop): New function.
(gfc_match_oacc_serial): Likewise.
(oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP.
(resolve_omp_clauses): Handle EXEC_OACC_SERIAL.
(oacc_code_to_statement): Handle EXEC_OACC_SERIAL and
EXEC_OACC_SERIAL_LOOP.
(gfc_resolve_oacc_directive): Likewise.
* parse.c (decode_oacc_directive) <'s'>: Add case for "serial"
and "serial loop".
(next_statement): Handle ST_OACC_SERIAL

Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses

2019-11-07 Thread Harwath, Frederik
Hi Jakub,

On 06.11.19 14:00, Jakub Jelinek wrote:
> On Wed, Nov 06, 2019 at 01:41:47PM +0100, frede...@codesourcery.com wrote:
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
>> @@ -128,6 +128,12 @@ struct omp_context
>> [...]
>> +  /* A tree_list of the reduction clauses in this context.  */
>> +  tree local_reduction_clauses;
>> +
>> +  /* A tree_list of the reduction clauses in outer contexts.  */
>> +  tree outer_reduction_clauses;
> 
> Could there be acc in the name to make it clear it is OpenACC only?

Yes, will be added.


>> @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> [...]
>> @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
>> [...]
>> +  ctx->local_reduction_clauses = NULL;
>> +  ctx->outer_reduction_clauses = NULL;
> 
> The = NULL assignments are unnecessary in all 3 cases, ctx is allocated with
> XCNEW.

Ok, will be removed.

>> @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>>goto do_private;
>>  
>>  case OMP_CLAUSE_REDUCTION:
>> +  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
>> +ctx->local_reduction_clauses
>> +  = tree_cons (NULL, c, ctx->local_reduction_clauses);
> 
> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
> more natural, wouldn't it.

Yes.

> Or, wouldn't it be better to do this checking in the gimplifier instead of
> omp-low.c?  There we have splay trees with GOVD_REDUCTION etc. for the
> variables, so it wouldn't be O(#reductions^2) compile time> It is true that 
> the gimplifier doesn't record the reduction codes (after
> all, OpenMP has UDRs and so there can be fairly arbitrary reductions).


Right, I have considered moving the implementation somewhere else before.
I am going to look into this, but perhaps we will just keep it where it is
if otherwise the implementation becomes more complicated.

> Consider million reduction clauses on nested loops.
> If gimplifier is not the right spot, then use a splay tree + vector instead?
> splay tree for the outer ones, vector for the local ones, and put into both
> the clauses, so you can compare reduction code etc.

Sounds like a good idea. I am going to try that.
However, I have not seen the suboptimal data structure choices
of the original patch as a problem, since the case of million reduction clauses
has not occurred to me.

Thank you for your feedback!

Best regards,
Frederik





Re: [PATCH 5/7] Remove last leftover usage of params* files.

2019-11-12 Thread Harwath, Frederik
Hi Martin,

On 06.11.19 13:40, Martin Liska wrote:

>   (finalize_options_struct): Remove.

This patch has been committed by now, but it seems that a single use of 
finalize_options_struct has been overlooked
in gcc/tree-streamer-in.c.

Best regards,
Frederik



Move pass_oacc_device_lower after pass_graphite

2020-11-03 Thread Frederik Harwath

Hi,

as a first step towards enabling the use of Graphite for optimizing
OpenACC loops this patch moves the OpenACC device lowering after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which ensures
that they run on OpenACC functions only. The choice of the new position
for pass_oacc_device_lower is further constrainted by the need to
execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes if
optimizations are disabled is included inside of a new
pass_no_optimizations.

The patch has been bootstrapped on x86_64-linux-gnu and tested with the
GCC testsuite and with the libgomp testsuite with nvptx and gcn
offloading.

The patch should have no impact on non-OpenACC user code. However the
new pass instances have changed the pass instance numbering and hence
the dump scanning commands in several tests had to be adjusted. I hope
that I found all that needed adjustment, but it is well possible that I
missed some tests that execute for particular targets or non-default
languages only. The resulting UNRESOLVED tests are usually easily fixed
by appending a pass number to the name of a pass that previously had no
number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass
number (e.g. "dce6" becomes "dce7") in a dump scanning command.

The patch leads to several new unresolved tests in the libgomp testsuite
which are caused by the combination of torture testing, missing cleanup
of the offload dump files, and the new pass numbering.  If a test that
uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with
"-O0" and afterwards with "-O2", each run of the test executes different
instances of pass_oacc_device_lower and produces dumps whose names
differ only in the pass instance number.  The dump scanning command in
the second run fails, because the dump files do not get removed after
the first run and the command consequently matches two different dump
files.  This seems to be a known issue.  I am going to submit a patch
that implements the cleanup of the offload dumps soon.

I have tried to rule out performance regressions by running different
benchmark suites with nvptx and gcn offloading. Nevertheless, I think
that it makes sense to keep an eye on OpenACC performance in the close
future and revisit the optimizations that run on the device lowered
function if necessary.

Ok to include the patch in master?

Best regards,
Frederik


-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 93fb166876a0540416e19c9428316d1370dd1e1b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 3 Nov 2020 12:58:37 +0100
Subject: [PATCH] Move pass_oacc_device_lower after pass_graphite

As a first step towards enabling the use of Graphite for optimizing
OpenACC loops, the OpenACC device lowering must be moved after the
Graphite pass.  This means that the device lowering now takes place
after some crucial optimization passes. Thus new instances of those
passes are added inside of a new pass pass_oacc_functions which
ensures that they execute on OpenACC functions only. The choice of the
new position for pass_oacc_device_lower is further constrainted by the
need to execute it before pass_vectorize.  This means that
pass_oacc_device_lower now runs inside of pass_tree_loop. A further
instance of the pass that handles functions without loops is added
inside of pass_tree_no_loop. Yet another pass instance that executes
if optimizations are disabled is included inside of a new
pass_no_optimizations.

2020-11-03  Frederik Harwath  
	Thomas Schwinge  

gcc/ChangeLog:

	* omp-general.c (oacc_get_fn_dim_size): Adapt.
	* omp-offload.c (pass_oacc_device_lower::clone) : New method.
	* passes.c (class pass_no_optimizations): New pass.
	(make_pass_no_optimizations): New static function.
	* passes.def: Move pass_oacc_device_lower into pass_tree_loop
	and add further instances to pass_tree_no_loop and to new pass
	pass_no_optimizations. Add new instances of
	pass_lower_complex, pass_ccp, pass_sink_code,
	pass_complete_unrolli, pass_backprop, pass_phiprop,
	pass_forwprop, pass_vrp, pass_dce, pass_loop_done,
	pass_loop_init, pass_fix_loops supporting the
	pass_oacc_device_lower instance in pass_tree_loop.
	* tree-pass.h (make_pass_oacc_functions): New static function.
	(make_pass_oacc_functions): New static function.
	* tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New method.
	(pass_complete_unrolli::clone): New method.
	* tree-ssa-loop.c (pass

[PATCH] testsuite: Clean up lto and offload dump files

2020-11-04 Thread Frederik Harwath

Hi,

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects leftover
dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is used
for matching the "ltrans" files is also changed since the existing
pattern failed to remove some LTO ("ltrans0.ltrans.") dump files.


This patch gets rid of at least one unresolved libgomp test result that
would otherwise be introduced by the patch discussed in this thread:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557889.html


diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {

[...]

-   lappend tfiles "$stem.{$basename_ext,exe}"

I do not understand why "exe" should be included here. I have removed it
and I did not notice any files matching the resultig pattern being left
back by "make check-gcc".


Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9eb5da60e8822e1f6fa90b32bff6123ed62c146c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 4 Nov 2020 14:09:46 +0100
Subject: [PATCH] testsuite: Clean up lto and offload dump files

Dump files produced from an offloading compiler through
"-foffload=-fdump-..." do not get removed by gcc-dg.exp and other
exp-files of the testsuite that use the cleanup code from this file
(e.g.  libgomp). This can lead to problems if scan-dump detects
leftover dumps from previous runs of a test case.

This patch adapts the existing cleanup logic for "-flto" to handle
"-flto" and "-foffload" in a uniform way. The glob pattern that is
used for matching the "ltrans" files is also changed since the
existing pattern failed to match some dump files.

2020-11-04  Frederik Harwath  

gcc/testsuite/ChangeLog:

	* lib/gcc-dg.exp (proc schedule-cleanups): Adapt "-flto" handling,
	add "-foffload" handling.
---
 gcc/testsuite/lib/gcc-dg.exp | 50 
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index e8ad3052657..e0560af205f 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -194,31 +194,47 @@ proc schedule-cleanups { opts } {
 # stem.ext..
 # (tree)passes can have multiple instances, thus optional trailing *
 set ptn "\[0-9\]\[0-9\]\[0-9\]$ptn.*"
+set ltrans no
+set mkoffload no
+
 # Handle ltrans files around -flto
 if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] {
 	verbose "Cleanup -flto seen" 4
-	set ltrans "{ltrans\[0-9\]*.,}"
-} else {
-	set ltrans ""
+	set ltrans yes
+}
+
+if [regexp -- {(^|\s+)-foffload=} $opts] {
+	verbose "Cleanup -foffload seen" 4
+	set mkoffload yes
 }
-set ptn "$ltrans$ptn"
+
 verbose "Cleanup final ptn: $ptn" 4
 set tfiles {}
 foreach src $testcases {
-	set basename [file tail $src]
-	if { $ltrans != "" } {
-	# ??? should we use upvar 1 output_file instead of this (dup ?)
-	set stem [file rootname $basename]
-	set basename_ext [file extension $basename]
-	if {$basename_ext != ""} {
-		regsub -- {^.*\.} $basename_ext {} basename_ext
-	}
-	lappend tfiles "$stem.{$basename_ext,exe}"
-	unset basename_ext
-	} else {
-	lappend tfiles $basename
-	}
+set basename [file tail $src]
+set stem [file rootname $basename]
+set basename_ext [file extension $basename]
+if {$basename_ext != ""} {
+regsub -- {^.*\.} $basename_ext {} basename_ext
+}
+set extensions [list $basename_ext]
+
+if { $ltrans == yes } {
+lappend extensions "ltrans\[0-9\]*.ltrans"
+}
+if { $mkoffload == yes} {
+# The * matches the offloading target's name, e.g. "xnvptx-none".
+lappend extensions "*.mkoffload"
+}
+
+set extensions_ptn [join $extensions ","]
+if { [llength $extensions] > 1 } {
+set extensions_ptn "{$extensions_ptn}"
+}
+
+  	lappend tfiles "$stem.$extensions_ptn"
 }
+
 if { [llength $tfiles] > 1 } {
 	set tfiles [join $tfiles ","]
 	set tfiles "{$tfiles}"
-- 
2.17.1



Re: Move pass_oacc_device_lower after pass_graphite

2020-11-06 Thread Frederik Harwath

Hi Richard,

Richard Biener  writes:

> On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath

> What's on my TODO list (or on the list of things to explore) is to make
> the dump file names/suffixes explicit in passes.def like via
>
>   NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc")
>
> and we'd get a dump named .ccp_oacc or so.

That would be very helpful for avoiding the drudgery of adapting those
pass numbers!

> Now, what does oacc_device_lower actually do that you need to
> re-run complex lowering?  What does cunrolli do at this point that
> the complete_unroll pass later does not do?
>

Good spot, "cunrolli" seems to be unnecessary.  The complex lowering is
necessary to handle the code that gets created by the OpenACC reduction
lowering during oaccdevlow.  I have attached a test case (a reduced
version of
libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which
shows that the complex instructions are created by
pass_oacc_device_lower and which leads to an ICE if compiled without the
new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2").
The problem is an unlowered addition. This is from a diff of the dump of
the pass following oaccdevlow1 (ccp4) with disabled and with enabled
tree-cplxlower2:

<   _91 = VIEW_CONVERT_EXPR(_1);
<   _92 = reduction_var_2 + _91;
---
>   _104 = REALPART_EXPR (_1)>;
>   _105 = IMAGPART_EXPR (_1)>;
>   _91 = COMPLEX_EXPR <_104, _105>;
>   _106 = reduction_var$real_100 + _104;
>   _107 = reduction_var$imag_101 + _105;
>   _92 = COMPLEX_EXPR <_106, _107>;

> What's special about oacc_device lower that doesn't also apply
> to omp_device_lower?

The passes do different things. The goal is to optimize OpenACC
loops using Graphite. The relevant lowering of the internal OpenACC
function calls happens in pass_oacc_device_lower.

> Is all this targeted at code compiled exclusively for the offload
> target?  Thus we're in lto1 here?

The OpenACC outlined functions also get compiled for the host.

> Does it make eventually more sense to have a completely custom pass
> pipeline for the  offload compilation?  Maybe even per offload target?
> See how we have a custom pipeline for -Og (pass_all_optimizations_g).

What would be the main benefits of a separate pipeline? Avoiding
(re-)running passes unneccessarily, less unwanted interactions
in the test suite (but your suggestion above regarding the fixed
pass names would also solve this)?

>> Ok to include the patch in master?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
new file mode 100644
index 000..6879e5aaf25
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c
@@ -0,0 +1,50 @@
+/* { dg-additional-options "-foffload=-fdump-tree-cplxlower2" } */
+/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */
+/* { dg-do link } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } {""} } */
+
+#include 
+#if !defined(__hppa__) || !defined(__hpux__)
+#include 
+#endif
+
+#define N 100
+
+static float _Complex __attribute__ ((noinline))
+sum (float _Complex ary[N])
+{
+  float _Complex reduction_var = 0;
+#pragma acc parallel loop gang reduction(+:reduction_var)
+  for (int ix = 0; ix < N; ix++)
+reduction_var += ary[ix];
+
+ return reduction_var;
+}
+
+int main (void)
+{
+  float _Complex ary[N];
+  float _Complex result;
+
+  for (int ix = 0; ix < N;  ix++)
+{
+  float frac = ix * (1.0f / 1024) + 1.0f;
+  ary[ix] = frac + frac * 2.0j - 1.0j;
+}
+
+  result = sum (ary);
+  printf("%.1f%+.1fi\n", creal(result), cimag(result));
+  return 0;
+}
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR" 1 "oaccdevlow1" } }
+
+ There is just one COMPLEX_EXPR right before oaccdevlow1 ...*/
+
+/* { dg-final { scan-offload-tree-dump-times "GOACC_REDUCTION .*?reduction_var.*?;" 4 "oaccdevlow1" } }
+
+  ... but several IFN_GOACC_REDUCTION calls for the reduction variable which are subsequently lowered ... */
+
+/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR " 4  "cplxlower2" } }
+
+ ... which introduces new COMPLEX_EXPRs. */


[PATCH 0/2] Use Graphite for OpenACC "kernels" regions

2020-11-12 Thread Frederik Harwath


Hi,
the two following patches implement a new handling of the loops in
OpenACC "kernels" regions which is based on Graphite and which is meant
to replace the current handling based on the "parloops" pass.  This
extends the class of OpenACC codes using "kernels" regions that can be
analysed by GCC's OpenACC implementation considerably.

We would like to incorporate this work into master soon, but further
work will be necessary in the next weeks to resolve some open questions,
clean up the code etc. In particular, the patches cannot be applied on
master currently because they rely on other patches which have not been
committed to master yet, e.g. the re-ordering of the OpenACC passes to
run device lowering after Graphite which has recently been submitted
(subject "Move pass_oacc_device_lower after pass_graphite"), the
transformation pass which converts OpenACC kernels regions to parallel
regions from OG10 (commit 809ea59722263eb6c2d48402e1eed80727134038).

Best regards,
Frederik


Frederik Harwath (2):
  [WIP] OpenACC: Add Graphite-based handling of "auto" loops
  OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

 gcc/c-family/c.opt|   5 +-
 gcc/common.opt|   8 +
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/passes.texi   |   6 +-
 gcc/flag-types.h  |   1 +
 gcc/gimple-pretty-print.c |   3 +
 gcc/gimple.h  |   9 +-
 gcc/gimplify.c|   1 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  89 +-
 gcc/omp-general.c |  19 +-
 gcc/omp-general.h |   1 +
 gcc/omp-low.c |  76 +-
 gcc/omp-oacc-kernels.c|  59 +-
 gcc/omp-offload.c | 223 -
 gcc/predict.c |   2 +-
 .../goacc/kernels-conversion-parloops.c   |  61 ++
 .../c-c++-common/goacc/kernels-conversion.c   |  12 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 .../gfortran.dg/goacc/kernels-reductions.f90  |  37 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-parloops.c   |  16 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 gcc/tree-ssa-loop.c   |  10 +
 39 files changed, 2265 insertions(+), 377 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90
 create mode 100644 gcc/tree-chrec-oacc.h

--
2.17.1
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops

2020-11-12 Thread Frederik Harwath


This patch enables the use of Graphite for the analysis of OpenACC
"auto" loops. The goal is to decide if a loop may be parallelized
(i.e. converted to an "independent" loop) or not.  Graphite and the
functionality on which it relies (scalar evolution, data references) are
extended to interpret the internal representation of OpenACC loop
constructs that is encoded (e.g. through calls to OpenACC-specific
internal functions) in the OpenACC outlined functions (".omp_fn") and to
ignore some artifacts of the outlining process that are not relevant for
the analysis the original loops (e.g. pointers introduced for the
purpose of offloading are irrelevant to the question whether the
original loops can be parallelized or not). This is done in a way that
does not impact code which does not use OpenACC.  Furthermore, Graphite
is extended by functionality that extends its applicability to
real-world code (e.g. runtime alias checking).  The OpenACC lowering is
extended to use the result of Graphite's analysis to assign
"independent" clauses to loops.
---
 gcc/common.opt|   8 +
 gcc/graphite-dependences.c|  12 +-
 gcc/graphite-isl-ast-to-gimple.c  |  77 +-
 gcc/graphite-oacc.h   |  90 ++
 gcc/graphite-scop-detection.c | 828 ++
 gcc/graphite-sese-to-poly.c   |  26 +-
 gcc/graphite.c| 403 -
 gcc/graphite.h|  11 +-
 gcc/internal-fn.h |   7 +-
 gcc/omp-expand.c  |  26 +-
 gcc/omp-offload.c | 173 +++-
 gcc/predict.c |   2 +-
 .../graphite/alias-0-no-runtime-check.c   |  20 +
 .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
 gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
 gcc/tree-chrec-oacc.h |  45 +
 gcc/tree-chrec.c  |  16 +-
 gcc/tree-data-ref.c   | 112 ++-
 gcc/tree-data-ref.h   |   8 +-
 gcc/tree-loop-distribution.c  |  17 +-
 gcc/tree-scalar-evolution.c   | 257 +-
 gcc/tree-ssa-loop-ivcanon.c   |   9 +-
 gcc/tree-ssa-loop-niter.c |  13 +
 23 files changed, 1870 insertions(+), 333 deletions(-)
 create mode 100644 gcc/graphite-oacc.h
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
 create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
 create mode 100644 gcc/tree-chrec-oacc.h

diff --git a/gcc/common.opt b/gcc/common.opt
index dfed6ec76ba..caaeaa1aa6f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1600,6 +1600,14 @@ fgraphite-identity
 Common Report Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation.

+fgraphite-non-affine-accesses
+Common Report Var(flag_graphite_non_affine_accesses) Init(0)
+Allow Graphite to handle non-affine data accesses.
+
+fgraphite-runtime-alias-checks
+Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
+Allow Graphite to add runtime alias checks to loops if aliasing cannot be 
resolved statically.
+
 fhoist-adjacent-loads
 Common Report Var(flag_hoist_adjacent_loads) Optimization
 Enable hoisting adjacent loads to encourage generating conditional move
diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
index 7078c949800..76ba027cdf3 100644
--- a/gcc/graphite-dependences.c
+++ b/gcc/graphite-dependences.c
@@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding read to depedence graph: ");
+   fprintf (dump_file, "Adding read to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
reads = isl_union_map_union (reads, um);
if (dump_file)
  {
-   fprintf (dump_file, "Reads depedence graph: ");
+   fprintf (dump_file, "Reads dependence graph: ");
print_isl_union_map (dump_file, reads);
  }
  }
@@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads,
  {
if (dump_file)
  {
-   fprintf (dump_file, "Adding must write to depedence graph: ");
+   fprintf (dump_file, "Adding must write to dependence graph: ");
print_pdr (dump_file, pdr);
  }
isl_union_map *um
@@ -106,7 +106,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
*&reads,
must_writes = isl_union_map_union (must_writes, um);
if (dump_file)
 

[PATCH 2/2] OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels

2020-11-12 Thread Frederik Harwath


This patch changes the "kernels" conversion to route loops in OpenACC
"kernels" regions through Graphite. This is done by converting the loops
in "kernels" regions which are not yet known to be "independent" to
"auto" loops as in the current (OG10) "parloops" based "kernels"
handling. Afterwards, the "kernels" regions will now be treated
essentially like "parallel" regions. A new internal target kind however
still enables to distinguish between the types of regions which is
useful for diagnostic messages.

The old "parloops" based "kernels" handling will be deprecated, but is
still available through the command line options
"-fopenacc-kernels=split-parloops" and "-fopenacc-kernels=parloops".
---
 gcc/c-family/c.opt|  5 +-
 gcc/doc/invoke.texi   | 10 ++-
 gcc/doc/passes.texi   |  6 +-
 gcc/flag-types.h  |  1 +
 gcc/gimple-pretty-print.c |  3 +
 gcc/gimple.h  |  9 ++-
 gcc/gimplify.c|  1 +
 gcc/omp-expand.c  | 63 +--
 gcc/omp-general.c | 19 -
 gcc/omp-general.h |  1 +
 gcc/omp-low.c | 76 +++
 gcc/omp-oacc-kernels.c| 59 --
 gcc/omp-offload.c | 50 +++-
 .../goacc/kernels-conversion-parloops.c   | 61 +++
 .../c-c++-common/goacc/kernels-conversion.c   | 12 +--
 .../gfortran.dg/goacc/kernels-reductions.f90  | 37 +
 gcc/tree-parloops.c   | 16 +++-
 gcc/tree-ssa-loop.c   | 10 +++
 18 files changed, 395 insertions(+), 44 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4ef7ea76aa1..255ff84ca4b 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1747,7 +1747,7 @@ Specify default OpenACC compute dimensions.

 fopenacc-kernels=
 C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
--fopenacc-kernels=[split|parloops] Configure OpenACC 'kernels' constructs 
handling.
+-fopenacc-kernels=[split|split-parloops|parloops]  Configure OpenACC 
'kernels' constructs handling.

 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
@@ -1755,6 +1755,9 @@ Name(openacc_kernels) Type(enum openacc_kernels)
 EnumValue
 Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT)

+EnumValue
+Enum(openacc_kernels) String(split-parloops) 
Value(OPENACC_KERNELS_SPLIT_PARLOOPS)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fe04b4d8e6a..d713d6ae8ab 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2266,12 +2266,20 @@ permitted.
 @opindex fopenacc-kernels
 @cindex OpenACC accelerator programming
 Configure OpenACC 'kernels' constructs handling.
+
 With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
 are split into a sequence of compute constructs, each then handled
-individually.
+individually. The data dependence analysis that is necessary to
+determine if loops can be parallelized is performed by the Graphite
+pass.
 This is the default.
+With @option{-fopenacc-kernels=split-parloops}, OpenACC 'kernels' constructs
+are split into a sequence of compute constructs, each then handled
+individually.
+This is deprecated.
 With @option{-fopenacc-kernels=parloops}, the whole OpenACC
 'kernels' constructs is handled by the @samp{parloops} pass.
+This is deprecated.

 @item -fopenmp
 @opindex fopenmp
diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi
index 7424690dac3..5dda056a2bb 100644
--- a/gcc/doc/passes.texi
+++ b/gcc/doc/passes.texi
@@ -248,9 +248,9 @@ constraints in order to generate the points-to sets.  It is 
located in

 This is a pass group for processing OpenACC kernels regions.  It is a
 subpass of the IPA OpenACC pass group that runs on offloaded functions
-containing OpenACC kernels loops.  It is located in
-@file{tree-ssa-loop.c} and is described by
-@code{pass_ipa_oacc_kernels}.
+containing OpenACC kernels loops if @samp{parloops} based handling of
+kernels regions is used. It is located in @file{tree-ssa-loop.c} and
+is described by @code{pass_ipa_oacc_kernels}.

 @item Target clone

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index e2255a56745..058c4e214af 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -376,6 +376,7 @@ enum cf_protection_level
 enum openacc_kernels
 {
   OPENACC_KERNELS_SPLIT,
+  OPENACC_KERNELS_SPLIT_PARLOOPS,
   OPENACC_KERNELS_PARLOOPS
 };

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 54a6d318dc5..b4a2

Re: [PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops

2020-11-16 Thread Frederik Harwath


Hi Richard,

Richard Biener  writes:

> On Thu, Nov 12, 2020 at 11:11 AM Frederik Harwath
>  wrote:
>>
>> This patch enables the use of Graphite for the analysis of OpenACC
>> "auto" loops. [...]
>> Furthermore, Graphite is extended by functionality that extends
>> its applicability to real-world code (e.g. runtime alias checking).
>
> I wonder if this can be split into a refactoring of graphite and adding
> runtime alias capability and a part doing the OpenACC pieces.
>

Yes, I did not remove the runtime alias checking from this WIP-patch,
but I planned to submit it separately. I am going to do this soon.

Frederik


> Richard.
>
>> ---
>>  gcc/common.opt|   8 +
>>  gcc/graphite-dependences.c|  12 +-
>>  gcc/graphite-isl-ast-to-gimple.c  |  77 +-
>>  gcc/graphite-oacc.h   |  90 ++
>>  gcc/graphite-scop-detection.c | 828 ++
>>  gcc/graphite-sese-to-poly.c   |  26 +-
>>  gcc/graphite.c| 403 -
>>  gcc/graphite.h|  11 +-
>>  gcc/internal-fn.h |   7 +-
>>  gcc/omp-expand.c  |  26 +-
>>  gcc/omp-offload.c | 173 +++-
>>  gcc/predict.c |   2 +-
>>  .../graphite/alias-0-no-runtime-check.c   |  20 +
>>  .../gcc.dg/graphite/alias-0-runtime-check.c   |  21 +
>>  gcc/testsuite/gcc.dg/graphite/alias-1.c   |  22 +
>>  gcc/tree-chrec-oacc.h |  45 +
>>  gcc/tree-chrec.c  |  16 +-
>>  gcc/tree-data-ref.c   | 112 ++-
>>  gcc/tree-data-ref.h   |   8 +-
>>  gcc/tree-loop-distribution.c  |  17 +-
>>  gcc/tree-scalar-evolution.c   | 257 +-
>>  gcc/tree-ssa-loop-ivcanon.c   |   9 +-
>>  gcc/tree-ssa-loop-niter.c |  13 +
>>  23 files changed, 1870 insertions(+), 333 deletions(-)
>>  create mode 100644 gcc/graphite-oacc.h
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c
>>  create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c
>>  create mode 100644 gcc/tree-chrec-oacc.h
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index dfed6ec76ba..caaeaa1aa6f 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1600,6 +1600,14 @@ fgraphite-identity
>>  Common Report Var(flag_graphite_identity) Optimization
>>  Enable Graphite Identity transformation.
>>
>> +fgraphite-non-affine-accesses
>> +Common Report Var(flag_graphite_non_affine_accesses) Init(0)
>> +Allow Graphite to handle non-affine data accesses.
>> +
>> +fgraphite-runtime-alias-checks
>> +Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1)
>> +Allow Graphite to add runtime alias checks to loops if aliasing cannot be 
>> resolved statically.
>> +
>>  fhoist-adjacent-loads
>>  Common Report Var(flag_hoist_adjacent_loads) Optimization
>>  Enable hoisting adjacent loads to encourage generating conditional move
>> diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c
>> index 7078c949800..76ba027cdf3 100644
>> --- a/gcc/graphite-dependences.c
>> +++ b/gcc/graphite-dependences.c
>> @@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *&reads,
>>   {
>> if (dump_file)
>>   {
>> -   fprintf (dump_file, "Adding read to depedence graph: ");
>> +   fprintf (dump_file, "Adding read to dependence graph: ");
>> print_pdr (dump_file, pdr);
>>   }
>> isl_union_map *um
>> @@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *&reads,
>> reads = isl_union_map_union (reads, um);
>> if (dump_file)
>>   {
>> -   fprintf (dump_file, "Reads depedence graph: ");
>> +   fprintf (dump_file, "Reads dependence graph: ");
>> print_isl_union_map (dump_file, reads);
>>   }
>>   }
>> @@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map 
>> *&reads,
>>   {
>> if (dump_file)
>>   {
>>

[PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-19 Thread Harwath, Frederik
Hi,
this patch implements a runtime ISA check for amdgcn offloading.
The check verifies that the ISA of the GPU to which we try to offload matches
the ISA for which the code to be offloaded has been compiled. If it detects
a mismatch, it emits an error message which contains a hint at the correct 
compilation
parameters for the GPU. For instance:

  "libgomp: GCN fatal error: GCN code object ISA 'gfx906' does not match GPU 
ISA 'gfx900'.
   Try to recompile with '-foffload=-march=gfx900'."
or
  "libgomp: GCN fatal error: GCN code object ISA 'gfx900' does not match agent 
ISA 'gfx803'.
   Try to recompile with '-foffload=-march=fiji'."

(By the way, the names that we use for the ISAs are a bit inconsistent. Perhaps 
we should just
 use the gfx-names for all ISAs everywhere?.)

Without this patch, the user only gets an confusing error message from the HSA 
runtime which
fails to load the GCN object code.

I have checked that the code does not lead to any regressions when running
the test suite correctly, i.e. with the "-foffload=-march=..." option
given to the compiler matching the architecture of the GPU.
It seems difficult to implement an automated test that triggers an ISA mismatch.
I have tested manually (for different combinations of the compilation flags
and offloading GPU ISAs) that the runtime ISA check produces the expected error 
messages.

Is it ok to commit this patch to the master branch?

Frederik



From 27981f9c93d1efed6d943dae4ea0c52147c02d5b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 07:45:43 +0100
Subject: [PATCH] Add runtime ISA check for amdgcn offloading

When executing code that uses amdgcn GPU offloading, the ISA of the GPU must
match the ISA for which the code has been compiled.  So far, the libgomp amdgcn
plugin did not attempt to verify this.  In case of a mismatch, the user is
confronted with an unhelpful error message produced by the HSA runtime.

This commit implements a runtime ISA check. In the case of a ISA mismatch, the
execution is aborted with a clear error message and a hint at the correct
compilation parameters for the GPU on which the execution has been attempted.

libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum.
	(EF_AMDGPU_MACH_MASK): New constant.
	(gcn_isa): New typedef.
	(gcn_gfx801_s): New constant.
	(gcn_gfx803_s): New constant.
	(gcn_gfx900_s): New constant.
	(gcn_gfx906_s): New constant.
	(gcn_isa_name_len): New constant.
	(elf_gcn_isa_field): New function.
	(isa_hsa_name): New function.
	(isa_gcc_name): New function.
	(isa_code): New function.
	(struct agent_info): Add field "device_isa" ...
	(GOMP_OFFLOAD_init_device): ... and init from here,
	failing if device has unknown ISA; adapt init of "gfx900_p"
	to use new constants.
	(isa_matches_agent): New function ...
	(create_and_finalize_hsa_program): ... used from here to check
	that the GPU ISA and the code-object ISA match.
---
 libgomp/plugin/plugin-gcn.c | 127 +++-
 1 file changed, 126 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 16ce251f3a5..14f4a707a7c 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -396,6 +396,88 @@ struct gcn_image_desc
   struct global_var_info *global_variables;
 };
 
+/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we
+   support.
+   See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
+
+typedef enum {
+  EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028,
+  EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
+  EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
+  EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
+} EF_AMDGPU_MACH;
+
+const static int EF_AMDGPU_MACH_MASK = 0x00ff;
+typedef EF_AMDGPU_MACH gcn_isa;
+
+const static char* gcn_gfx801_s = "gfx801";
+const static char* gcn_gfx803_s = "gfx803";
+const static char* gcn_gfx900_s = "gfx900";
+const static char* gcn_gfx906_s = "gfx906";
+const static int gcn_isa_name_len = 6;
+
+static int
+elf_gcn_isa_field (Elf64_Ehdr *image)
+{
+  return image->e_flags & EF_AMDGPU_MACH_MASK;
+}
+
+/* Returns the name that the HSA runtime uses for the ISA or NULL if we do not
+   support the ISA. */
+
+static const char*
+isa_hsa_name (int isa) {
+  switch(isa)
+{
+case EF_AMDGPU_MACH_AMDGCN_GFX801:
+  return gcn_gfx801_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX803:
+  return gcn_gfx803_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX900:
+  return gcn_gfx900_s;
+case EF_AMDGPU_MACH_AMDGCN_GFX906:
+  return gcn_gfx906_s;
+}
+  return NULL;
+}
+
+/* Returns the user-facing name that GCC uses to identify the architecture (e.g.
+   with -march) or NULL if we do not support the ISA.
+   Keep in sync with /gcc/config/gcn/gcn.{c,opt}.  */
+
+static const char*
+isa_gcc_name (int isa) {
+  switch(isa)

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2020-01-20 Thread Harwath, Frederik
Hi Thomas,
I have attached a patch containing the changes that you suggested.

On 16.01.20 17:00, Thomas Schwinge wrote:

> On 2019-12-20T17:46:57+0100, "Harwath, Frederik"  
> wrote:
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c
> 
> I suggest to rename this one to 'acc_get_property-nvptx.c'> [...]
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c

> I suggest to rename this one to 'acc_get_property-host.c'.

I renamed both.

> This assumes that the 'cuda*' interfaces and OpenACC/libgomp interfaces
> handle/order device numbers in the same way -- which it seems they do,
> but just noting this in case this becomes an issue at some point.

Correct, I have added a corresponding comment to acc_get_property-nvptx.c.

> Aside from improper data types being used for storing/printing the memory
> information, we have to expect 'acc_property_free_memory' to change
> between two invocations.  ;-)

Right! I have removed the assertion and changed it into ...
> 
> Better to just verify that 'free_mem >= 0' (by means of 'size_t' data
> type, I suppose), and 'free_mem <= total_mem'?

... this.

> 
> (..., and for avoidance of doubt: I think there's no point in
> special-casing this one for 'acc_device_host' where we know that
> 'free_mem' is always zero -- this may change in the future.)

Sure! But with the new "free_mem <= total_mem" assertion and since we assert
total_mem == 0 and since free_mem >= 0, we effectively also assert that in the
test right now ;-).


Ok to push the commit to master?

Best regards,
Frederik
From ef5a959bedc3214e86d6a683a02b693d82847ecd Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 14:07:03 +0100
Subject: [PATCH] Fix expectation and types in acc_get_property tests

* Weaken expectation concerning acc_property_free_memory.
  Do not expect the value returned by CUDA since that value might have
  changed in the meantime.
* Use correct type for the results of calls to acc_get_property in tests.

libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Remove "expected_free_mem" argument,
	change "expected_total_mem" argument type to size_t;
	change types of acc_get_property results to size_t.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this.

Reviewed-by: Thomas Schwinge  
---
 .../acc_get_property-aux.c| 28 +--
 ...t_property-3.c => acc_get_property-host.c} |  7 ++---
 ..._property-2.c => acc_get_property-nvptx.c} |  9 +++---
 3 files changed, 22 insertions(+), 22 deletions(-)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
index 952bdbf6aea..76c29501839 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
@@ -8,9 +8,8 @@
 
 void expect_device_properties
 (acc_device_t dev_type, int dev_num,
- int expected_total_mem, int expected_free_mem,
- const char* expected_vendor, const char* expected_name,
- const char* expected_driver)
+ size_t expected_memory, const char* expected_vendor,
+ const char* expected_name, const char* expected_driver)
 {
   const char *vendor = acc_get_property_string (dev_num, dev_type,
 		acc_property_vendor);
@@ -21,22 +20,23 @@ void expect_device_properties
   abort ();
 }
 
-  int total_mem = acc_get_property (dev_num, dev_type,
-acc_property_memory);
-  if (total_mem != expected_total_mem)
+  size_t total_mem = acc_get_property (dev_num, dev_type,
+   acc_property_memory);
+  if (total_mem != expected_memory)
 {
-  fprintf (stderr, "Expected acc_property_memory to equal %d, "
-	   "but was %d.\n", expected_total_mem, total_mem);
+  fprintf (stderr, "Expected acc_property_memory to equal %zd, "
+	   "but was %zd.\n", expected_memory, total_mem);
   abort ();
 
 }
 
-  int free_mem = acc_get_property (dev_num, dev_type,
+  size_t free_mem = acc_get_property (dev_num, dev_type,
    acc_property_free_memory);
-  if (free_mem != expected_free_mem)
+  if (free_mem > total_me

Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading

2020-01-20 Thread Harwath, Frederik
Hi Andrew,
Thanks for the review! I have attached a revised patch containing the changes 
that you suggested.

On 20.01.20 11:00, Andrew Stubbs wrote:

> On 20/01/2020 06:57, Harwath, Frederik wrote:
>> Is it ok to commit this patch to the master branch?
> 
> I can't see anything significantly wrong with the code of the patch, however 
> I have some minor issues I'd like fixed in the text.
> 
> [...] Please move the functions down into the "Utility functions" group. The 
> const static variables should probably go with them.

Done.

>> @@ -3294,7 +3415,11 @@ GOMP_OFFLOAD_init_device (int n)
>>    &buf);
>>    if (status != HSA_STATUS_SUCCESS)
>>  return hsa_error ("Error querying the name of the agent", status);
>> -  agent->gfx900_p = (strncmp (buf, "gfx900", 6) == 0);
>> +  agent->gfx900_p = (strncmp (buf, gcn_gfx900_s, gcn_isa_name_len) == 0);
>> +
>> +  agent->device_isa = isa_code (buf);
>> +  if (agent->device_isa < 0)
>> +    return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR);
> 
> Can device_isa not just replace gfx900_p? I think it's only tested in one 
> place, and that would be easily substituted.
> 

Yes, I have changed that one place to use agent->device_isa.

I would commit the patch then if nobody objects :-). The other approaches (fat 
binaries etc.) that have been discussed in
this thread seem to be long-term projects and until something like this gets 
implemented the early error checking
implemented by this patch seems to be better than nothing.

Frederik
From 470892454bf0d67ea71c2399f5819713592e46a0 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 07:45:43 +0100
Subject: [PATCH] Add runtime ISA check for amdgcn offloading

When executing code that uses amdgcn GPU offloading, the ISA of the GPU must
match the ISA for which the code has been compiled.  So far, the libgomp amdgcn
plugin did not attempt to verify this.  In case of a mismatch, the user is
confronted with an unhelpful error message produced by the HSA runtime.

This commit implements a runtime ISA check. In the case of a ISA mismatch, the
execution is aborted with a clear error message and a hint at the correct
compilation parameters for the GPU on which the execution has been attempted.

libgomp/
	* plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum.
	* (EF_AMDGPU_MACH_MASK): New constant.
	* (gcn_isa): New typedef.
	* (gcn_gfx801_s): New constant.
	* (gcn_gfx803_s): New constant.
	* (gcn_gfx900_s): New constant.
	* (gcn_gfx906_s): New constant.
	* (gcn_isa_name_len): New constant.
	* (elf_gcn_isa_field): New function.
	* (isa_hsa_name): New function.
	* (isa_gcc_name): New function.
	* (isa_code): New function.
	* (struct agent_info): Add field "device_isa" and remove field
	"gfx900_p".
	* (GOMP_OFFLOAD_init_device): Adapt agent init to "agent_info"
	field changes, fail if device has unknown ISA.
	* (parse_target_attributes): Replace "gfx900_p" by "device_isa".
	* (isa_matches_agent): New function ...
	* (create_and_finalize_hsa_program): ... used from here to check
	that the GPU ISA and the code-object ISA match.
---
 libgomp/plugin/plugin-gcn.c | 131 ++--
 1 file changed, 127 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 16ce251f3a5..de470a3dd33 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -396,6 +396,20 @@ struct gcn_image_desc
   struct global_var_info *global_variables;
 };
 
+/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we
+   support.
+   See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */
+
+typedef enum {
+  EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028,
+  EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a,
+  EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c,
+  EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f,
+} EF_AMDGPU_MACH;
+
+const static int EF_AMDGPU_MACH_MASK = 0x00ff;
+typedef EF_AMDGPU_MACH gcn_isa;
+
 /* Description of an HSA GPU agent (device) and the program associated with
it.  */
 
@@ -408,8 +422,9 @@ struct agent_info
   /* Whether the agent has been initialized.  The fields below are usable only
  if it has been.  */
   bool initialized;
-  /* Precomputed check for problem architectures.  */
-  bool gfx900_p;
+
+  /* The instruction set architecture of the device. */
+  gcn_isa device_isa;
 
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
@@ -1232,7 +1247,8 @@ parse_target_attributes (void **input,
 
   if (gcn_dims_found)
 {
-  if (agent->gfx900_p && gcn_threads == 0 && override_z_dim == 0)
+  if (agent->device_isa == EF_AMDGPU_MACH_AMDGCN_GFX900
+	  && gcn_threads == 0 && override_z_dim == 0)

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2020-01-24 Thread Harwath, Frederik
Hi Thomas,

On 23.01.20 15:32, Thomas Schwinge wrote:

> On 2020-01-20T15:01:01+0100, "Harwath, Frederik"  
> wrote:
>> On 16.01.20 17:00, Thomas Schwinge wrote:
>>> On 2019-12-20T17:46:57+0100, "Harwath, Frederik" 
>>>  wrote:
>> Ok to push the commit to master?
> 
> Thanks, OK.  Reviewed-by: Thomas Schwinge 

Thank you. Committed as 4bd03ed69bd789278a0286017b692f49052ffe5c, including the 
changes to the size_t
formatting.

Best regards,
Frederik

> 
> 
> As a low-priority follow-up, please look into:
> 
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:
>  In function 'expect_device_properties':
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:74:24:
>  warning: format '%d' expects argument of type 'int', but argument 3 has type 
> 'const char *' [-Wformat=]
>74 |   fprintf (stderr, "Expected value of unknown string property 
> to be NULL, "
>   |
> ^~~~
>75 | "but was %d.\n", s);
>   |  ~
>   |  |
>   |  const char *
> 
> source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:75:19:
>  note: format string is defined here
>75 | "but was %d.\n", s);
>   |  ~^
>   |   |
>   |   int
>   |  %s
> 
> ..., and (random example):
> 
>>int unknown_property = 16058;
>> -  int v = acc_get_property (dev_num, dev_type, 
>> (acc_device_property_t)unknown_property);
>> +  size_t v = acc_get_property (dev_num, dev_type, 
>> (acc_device_property_t)unknown_property);
>>if (v != 0)
>>  {
>>fprintf (stderr, "Expected value of unknown numeric property to equal 
>> 0, "
>> -   "but was %d.\n", v);
>> +   "but was %zd.\n", v);
>>abort ();
>>  }
> 
> ..., shouldn't that be '%zu' given that 'size_t' is 'unsigned'?
> 
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:  fprintf (stderr, 
> "Expected acc_property_memory to equal %zd, "
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was 
> %zd.\n", expected_memory, total_mem);
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:", but free 
> memory was %zd and total memory was %zd.\n",
> libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was 
> %zd.\n", v);
> libgomp.oacc-c-c++-common/acc_get_property.c:  printf ("Total 
> memory: %zd\n", v);
> libgomp.oacc-c-c++-common/acc_get_property.c:  printf ("Free 
> memory: %zd\n", v);
> 
> 
> Grüße
>  Thomas
> 

From 4bd03ed69bd789278a0286017b692f49052ffe5c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jan 2020 14:07:03 +0100
Subject: [PATCH 1/2] Fix expectation and types in acc_get_property tests

* Weaken expectation concerning acc_property_free_memory.
  Do not expect the value returned by CUDA since that value might have
  changed in the meantime.
* Use correct type for the results of calls to acc_get_property in tests.

libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Remove "expected_free_mem" argument,
	change "expected_total_mem" argument type to size_t;
	change types of acc_get_property results to size_t,
	adapt format strings.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Use %zu instead of %zd to print size_t values.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and
	rename to ...
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this.

Reviewed-by: Thomas Schwinge  
---
 .../acc_get_property-aux.c| 30 +--
 ...t_property-3.c => acc_get_property-host.c} |  7 ++---
 ..._property-2.c => acc_get_property-nvptx.c} |  9 +++---
 .../acc_get_property.c|  4 +--
 4 files changed, 25 insertions(+), 25 deletions(-)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%)
 rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%)

diff --git a/libg

[PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-28 Thread Harwath, Frederik
Hi,
this patch adds full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin. This replaces
the existing stub in libgomp/plugin-gcn.c.

Andrew: The value returned for acc_property_memory ("size of device memory in 
bytes"
according to the spec) is the HSA_REGION_INFO_SIZE of the agent's data_region. 
This
has been adapted from a previous incomplete implementation that we had on the 
OG9 branch.
Does that sound reasonable?

I have tested the patch with amdgcn and nvptx offloading.

Ok to commit this to the main branch?


Best regards,
Frederik

From 6f1855281c38993a088f9b4af020a786f8e05fe9 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 28 Jan 2020 08:01:00 +0100
Subject: [PATCH] Add OpenACC acc_get_property support for AMD GCN

Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.

libgomp/
	* plugin-gcn.c (struct agent_info): Add fields "name" and
	"vendor_name" ...
	(GOMP_OFFLOAD_init_device): ... and init from here.
	(struct hsa_context_info): Add field "driver_version_s" ...
	(init_hsa_contest): ... and init from here.
	(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
	implementation.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Enable test execution for amdgcn and host offloading targets.
	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Split function into ...
	(expect_device_string_properties): ... this new function ...
	(expect_device_memory): ... and this new function.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
	Add test.
---
 libgomp/plugin/plugin-gcn.c   |  63 +++--
 .../acc_get_property-aux.c|  60 +---
 .../acc_get_property-gcn.c| 132 ++
 .../acc_get_property.c|   5 +-
 .../libgomp.oacc-fortran/acc_get_property.f90 |   2 -
 5 files changed, 224 insertions(+), 38 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7854c142f05..0a09daaa0a4 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -425,7 +425,10 @@ struct agent_info
 
   /* The instruction set architecture of the device. */
   gcn_isa device_isa;
-
+  /* Name of the agent. */
+  char name[64];
+  /* Name of the vendor of the agent. */
+  char vendor_name[64];
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
   struct goacc_asyncqueue *async_queues, *omp_async_queue;
@@ -544,6 +547,8 @@ struct hsa_context_info
   int agent_count;
   /* Array of agent_info structures describing the individual HSA agents.  */
   struct agent_info *agents;
+  /* Driver version string. */
+  char driver_version_s[30];
 };
 
 /* Format of the on-device heap.
@@ -1513,6 +1518,15 @@ init_hsa_context (void)
 	GOMP_PLUGIN_error ("Failed to list all HSA runtime agents");
 }
 
+  uint16_t minor, major;
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, &minor);
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version");
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, &major);
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version");
+  sprintf (hsa_context.driver_version_s, "HSA Runtime %d.%d", major, minor);
+
   hsa_context.initialized = true;
   return true;
 }
@@ -3410,15 +3424,19 @@ GOMP_OFFLOAD_init_device (int n)
 return hsa_error ("Error requesting maximum queue size of the GCN agent",
 		  status);
 
-  char buf[64];
   status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
-	  &buf);
+	  &agent->name);
   if (status != HSA_STATUS_SUCCESS)
 return hsa_error ("Error querying the name of the agent", status);
 
-  agent->device_isa = isa_code (buf);
+  agent->device_isa = isa_code (agent->name);
   if (agent->device_isa < 0)
-return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR);
+return hsa_error ("Unknown GCN agent architecture", HSA_STATUS_ERROR);
+
+  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_VENDOR_NAME,
+	  &agent->vendor_name);
+  if (status != HSA_STATUS_SUCCESS)
+return hsa_error ("Error querying the vendor name of the agent", status);
 
   status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size,
 	HSA_QUEUE_TYPE_MULTI,
@@ -4115,12 +4133,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src,
 union goacc_property_value
 GOMP_OF

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Andrew,

On 28.01.20 16:42, Andrew Stubbs wrote:
> On 28/01/2020 14:55, Harwath, Frederik wrote:
> 
> If we're going to use a fixed-size buffer then we should use snprintf and 
> emit GCN_WARNING if the return value is greater than 
> "sizeof(driver_version_s)", even though that is unlikely. Do the same in the 
> testcase, but use a bigger buffer so that truncation causes a mismatch and 
> test failure.

Ok.


> I realise that an existing function in this testcase uses this layout, but 
> the code style does not normally have the parameter list on the next line, 
> and certainly not in column 1.

Ok. I have also adjusted the formatting in the other acc_get_property tests to 
the code style. I have turned this into a separate trivial patch.

Ok to commit the revised patch?

Best regards,
Frederik

From fb15cb9058feeda8891d6454d32f43fda885b789 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 29 Jan 2020 10:19:50 +0100
Subject: [PATCH 1/2] Add OpenACC acc_get_property support for AMD GCN

Add full support for the OpenACC 2.6 acc_get_property and
acc_get_property_string functions to the libgomp GCN plugin.

libgomp/
	* plugin-gcn.c (struct agent_info): Add fields "name" and
	"vendor_name" ...
	(GOMP_OFFLOAD_init_device): ... and init from here.
	(struct hsa_context_info): Add field "driver_version_s" ...
	(init_hsa_contest): ... and init from here.
	(GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper
	implementation.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property.c:
	Enable test execution for amdgcn and host offloading targets.
	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
	(expect_device_properties): Split function into ...
	(expect_device_string_properties): ... this new function ...
	(expect_device_memory): ... and this new function.
	* testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c:
	Add test.
---
 libgomp/plugin/plugin-gcn.c   |  71 --
 .../acc_get_property-aux.c|  79 ++-
 .../acc_get_property-gcn.c| 132 ++
 .../acc_get_property.c|   5 +-
 .../libgomp.oacc-fortran/acc_get_property.f90 |   2 -
 5 files changed, 242 insertions(+), 47 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 7854c142f05..45c625495b9 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -425,7 +425,10 @@ struct agent_info
 
   /* The instruction set architecture of the device. */
   gcn_isa device_isa;
-
+  /* Name of the agent. */
+  char name[64];
+  /* Name of the vendor of the agent. */
+  char vendor_name[64];
   /* Command queues of the agent.  */
   hsa_queue_t *sync_queue;
   struct goacc_asyncqueue *async_queues, *omp_async_queue;
@@ -544,6 +547,8 @@ struct hsa_context_info
   int agent_count;
   /* Array of agent_info structures describing the individual HSA agents.  */
   struct agent_info *agents;
+  /* Driver version string. */
+  char driver_version_s[30];
 };
 
 /* Format of the on-device heap.
@@ -1513,6 +1518,23 @@ init_hsa_context (void)
 	GOMP_PLUGIN_error ("Failed to list all HSA runtime agents");
 }
 
+  uint16_t minor, major;
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, &minor);
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version");
+  status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, &major);
+  if (status != HSA_STATUS_SUCCESS)
+GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version");
+
+  size_t len = sizeof hsa_context.driver_version_s;
+  int printed = snprintf (hsa_context.driver_version_s, len,
+			  "HSA Runtime %hu.%hu", (unsigned short int)major,
+			  (unsigned short int)minor);
+  if (printed >= len)
+GCN_WARNING ("HSA runtime version string was truncated."
+		 "Version %hu.%hu is too long.", (unsigned short int)major,
+		 (unsigned short int)minor);
+
   hsa_context.initialized = true;
   return true;
 }
@@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n)
 return hsa_error ("Error requesting maximum queue size of the GCN agent",
 		  status);
 
-  char buf[64];
   status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
-	  &buf);
+	  &agent->name);
   if (status != HSA_STATUS_SUCCESS)
 return hsa_error ("Error querying the name of the agent", status);
 
-  agent->device_isa = isa_code (buf);
+  agent->device_isa = isa_code (agent->name);
   if (agent->device_isa < 0)
-return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR)

Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Andrew,

On 29.01.20 11:38, Andrew Stubbs wrote:
> On 29/01/2020 09:52, Harwath, Frederik wrote:

> 
> Patch 1 is OK with the formatting fixed.
> Patch 2 is OK.
> 
> Thanks very much,
> 

Committed as 2e5ea57959183bd5bd0356739bb5167417401a31 and 
87c3fcfa6bbb5c372d4e275276d21f601d0b62b0.

Thank you for the review,
Frederik



[PATCH][OpenACC] Add acc_device_radeon to name_of_acc_device_t function

2020-01-29 Thread Harwath, Frederik
Hi,
we should handle acc_device_radeon in the name_of_acc_device_t function
which is used in libgomp/oacc-init.c to display the name of devices
in several error messages.

Ok to commit this patch to master?

Best regards,
Frederik

From 6aacba3e8123ce5e0961857802fd7d8a103aa96b Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 27 Jan 2020 15:41:26 +0100
Subject: [PATCH] Add acc_device_radeon to name_of_acc_device_t function

libgomp/
	* oacc-init.c (name_of_acc_device_t): Handle acc_device_radeon.
---
 libgomp/oacc-init.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 89a30b3e716..ef12b4c16d0 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -115,6 +115,7 @@ name_of_acc_device_t (enum acc_device_t type)
 case acc_device_host: return "host";
 case acc_device_not_host: return "not_host";
 case acc_device_nvidia: return "nvidia";
+case acc_device_radeon: return "radeon";
 default: unknown_device_type_error (type);
 }
   __builtin_unreachable ();
-- 
2.17.1



Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-29 Thread Harwath, Frederik
Hi Thomas,

On 29.01.20 18:44, Thomas Schwinge wrote:

>> +  size_t len = sizeof hsa_context.driver_version_s;
>> +  int printed = snprintf (hsa_context.driver_version_s, len,
>> +  "HSA Runtime %hu.%hu", (unsigned short int)major,
>> +  (unsigned short int)minor);
>> +  if (printed >= len)
>> +GCN_WARNING ("HSA runtime version string was truncated."
>> + "Version %hu.%hu is too long.", (unsigned short int)major,
>> + (unsigned short int)minor);
> 
> (Can it actually happen that 'snprintf' returns 'printed > len' --
> meaning that it's written into random memory?  I thought 'snprintf' has a
> hard stop at 'len'?  Or does this indicate the amount of memory it
> would've written?  I should re-read the manpage at some point...)  ;-)
> 

Yes, "printed > len" can happen. Seems that I have chosen a bad variable name.
"actual_len" (of the formatted string that should have been written -
excluding the terminating '\0') would have been more appropriate.


> For 'printed = len' does or doesn't 'snprintf' store the terminating
> 'NUL' character, or do we manually have to set:
> 
> hsa_context.driver_version_s[len - 1] = '\0';
> 
> ... in that case?

No, in this case, the printed string is missing the last character, but the
terminating '\0' has been written. Consider:

#include 

int main () {
char s[] = "foo";
char buf[3];

// buf is too short to hold terminating '\0'
int actual_len = snprintf (buf, 3, "%s", s);
printf ("buf: %s\n", buf);
printf ("actual_len: %d\n", actual_len);
}

Output:


buf: fo
actual_len: 3

> 
>> @@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n)
> 
>> -  char buf[64];
>>status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME,
>> -  &buf);
>> +  &agent->name);
>>if (status != HSA_STATUS_SUCCESS)
>>  return hsa_error ("Error querying the name of the agent", status);
> 
> (That's of course pre-existing, but) this looks like a dangerous API,
> given that 'hsa_agent_get_info_fn' doesn't know 'sizeof agent->name' (or
> 'sizeof buf' before)...

The API documentation
(cf. 
https://rocm-documentation.readthedocs.io/en/latest/ROCm_API_References/ROCr-API.html)
states that "the type of this attribute is a NUL-terminated char[64]".
But, right, should this ever change, we might not notice it.

Best regards,
Frederik




Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN

2020-01-31 Thread Harwath, Frederik
Hi Thomas,

On 30.01.20 17:08, Thomas Schwinge wrote:

> I understand correctly that the only reason for:
> 
> On 2020-01-29T10:52:57+0100, "Harwath, Frederik"  
> wrote:
>>  * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c
>>  (expect_device_properties): Split function into ...
>>  (expect_device_string_properties): ... this new function ...
>>  (expect_device_memory): ... and this new function.
> 
> ... this split is that we can't test 'expect_device_memory' here:

> [...]

> ..., because that one doesn't (re-)implement the 'acc_property_memory'
> interface?

Correct. But why "re-"? It has not been implemented before.

>> --- a/libgomp/plugin/plugin-gcn.c
>> +++ b/libgomp/plugin/plugin-gcn.c
> 
>> @@ -4115,12 +4141,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, 
>> void *dst, const void *src,
>>  union goacc_property_value
>>  GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop)
>>  {
>> [...]
>> +  switch (prop)
>> +{
>> +case GOACC_PROPERTY_FREE_MEMORY:
>> +  /* Not supported. */
>> +  break;
> 
> (OK, can be added later when somebody feels like doing that.)

Well, "not supported" means that there seems to be no (reasonable) way to obtain
the necessary information from the runtime - in contrast to the nvptx plugin
where it can be obtained easily through the CUDA API.

> 
>> +case GOACC_PROPERTY_MEMORY:
>> +  {
>> +size_t size;
>> +hsa_region_t region = agent->data_region;
>> +hsa_status_t status =
>> +  hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SIZE, &size);
>> +if (status == HSA_STATUS_SUCCESS)
>> +  propval.val = size;
>> +break;
>> +  }
>> [...]
>>  }
> 
> Here we got 'acc_property_memory' implemented, but not here:
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c

Yes, there seems to be no straightforward way to determine the expected value 
through
the runtime API. We might of course try to replicate the logic that is
used in plugin-gcn.c.

Best regards,
Frederik



Re: Make OpenACC 'acc_get_property' with 'acc_device_current' work (was: [PATCH] Add OpenACC 2.6 `acc_get_property' support)

2020-02-03 Thread Harwath, Frederik
Hi Thomas,

On 30.01.20 16:54, Thomas Schwinge wrote:
> 
> [...] the 'acc_device_current' interface should work already now.
> 
> [...] Please review
> the attached (Tobias the Fortran test cases, please), and test with AMD
> GCN offloading.  If approving this patch, please respond with

I have tested the patch with AMD GCN offloading and I have observed no 
regressions.
The new tests pass as expected and print the correct output.
Great that you have extended the Fortran tests!

> diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
> index ef12b4c16d01..c28c0f689ba2 100644
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -796,7 +796,9 @@ get_property_any (int ord, acc_device_t d, 
> acc_device_property_t prop)
> size_t
> acc_get_property (int ord, acc_device_t d, acc_device_property_t prop)
> {
> -  if (!known_device_type_p (d))
> +  if (d == acc_device_current)
> +; /* Allowed only for 'acc_get_property', 'acc_get_property_string'.  */
> +  else if (!known_device_type_p (d))
> unknown_device_type_error(d);

I don't like the empty if branch very much. Introducing a variable
(for instance, "bool allowed_device_type = acc_device_current
|| known_device_type(d);") would also provide a place for your comment.
You could also extract a function to avoid duplicating the explanation
in acc_get_property_string.

The patch looks good to me.

Reviewed-by: Frederik Harwath  

Best regards,
Frederik



[PATCH] xfail and improve some failing libgomp tests

2020-02-07 Thread Harwath, Frederik
Hi,
the libgomp testsuite contains some test cases (all in 
/libgomp/testsuite/libgomp.c/)
which fail with nvptx offloading because of some long standing issues:

* {target-32.c, thread-limit-2.c}:
no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690

* target-{33,34}.c:
no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
https://gcc.gnu.org/PR81688

* target-link-1.c:
omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689


All these issues have been known, at least, since 2016:

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00972.html

As suggested in this mail:
 "Short term, it should be possible to implement something like -foffload=^nvptx
to skip PTX (and only PTX) offloading on those tests."

Well, we can now skip/xfail tests for nvptx offloading using the effective 
target
"offload_target_nvptx" and the present patch uses this to xfail the tests for 
which
no short-term solution is in sight, i.e. the GOMP_OFFLOAD_async_run and the 
"target link"
related failures.

Regarding the "usleep" issue, I have decided to follow Jakub's suggestion
(cf. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01026.html) to
replace usleep by busy waiting. As noted by Tobias
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690#c4), this involves creating 
separate
test files for the cases with and without usleep. This solution is a bit 
cumbersome but I
think we can live with it, in particular, since the actual test case 
implementations do not
get duplicated (they have been moved into auxiliary header files which are 
shared by both
variants of the corresponding tests).

Since the "usleep" issue also concerns amdgcn, I have introduced an effective 
target
"offload_target_amdgcn" to add xfails for this offloading target, too. This 
behaves like
"offload_target_nvptx" but for amdgcn. Note that the existing amdgcn effective 
targets
cannot be used for our purpose since they are OpenACC-specific.

The new thread-limit-2-nosleep.c should now pass for both nvptx and amdgcn 
offloading
whereas thread-limit-2.c should xfail. The new target-32-nosleep.c passes with 
amdgcn
offloading, but xfails with nvptx offloading, because it also needs the 
unimplemented
GOMP_OFFLOAD_async_run.

With the patch, the detailed test summary now looks as follows for me:

nvptx offloading:

// Expected execution failures due to missing usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
XFAIL: libgomp.c/target-32-nosleep.c execution test// missing 
GOMP_OFFLOAD_async_run
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)
UNRESOLVED: libgomp.c/thread-limit-2.c compilation failed to produce executable

// Expected execution failures due to missing GOMP_OFFLOAD_async_run
PASS: libgomp.c/target-33.c (test for excess errors)
XFAIL: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
XFAIL: libgomp.c/target-34.c execution test

// Expected compilation failures due to missing target link
XFAIL: libgomp.c/target-link-1.c (test for excess errors)
UNRESOLVED: libgomp.c/target-link-1.c compilation failed to produce executable


amdgcn offloading:

// Tests using usleep
PASS: libgomp.c/target-32-nosleep.c (test for excess errors)
PASS: libgomp.c/target-32-nosleep.c execution test
XFAIL: libgomp.c/target-32.c 7 blank line(s) in output
XFAIL: libgomp.c/target-32.c (test for excess errors)
UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable

PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors)
PASS: libgomp.c/thread-limit-2-nosleep.c execution test
XFAIL: libgomp.c/thread-limit-2.c 1 blank line(s) in output
XFAIL: libgomp.c/thread-limit-2.c (test for excess errors)

// No failures since GOMP_OFFLOAD_async_run works on amdgcn
PASS: libgomp.c/target-33.c (test for excess errors)
PASS: libgomp.c/target-33.c execution test
PASS: libgomp.c/target-34.c (test for excess errors)
PASS: libgomp.c/target-34.c execution test

// No xfail here
PASS: libgomp.c/target-link-1.c (test for excess errors)
FAIL: libgomp.c/target-link-1.c execution test

Note that target-link-1.c execution does also fail on amdgcn.
Since - in contrast to nvptx - it seems that the cause of this failure
has not yet been investigated and discussed, I have not added an xfail
for amdgcn to this test.

All testing has been done with a x86_64-linux-gnu host and target.

Ok to commit this patch?

Best regards,
Frederik





From 6e5e2d45f02235a0f72e6130dcd8d52f88f7b126 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 7 Feb 2020 08:03:00 +0100
Subject: [PATCH] xfail and improve some failing libgomp

Re: [PATCH] xfail and improve some failing libgomp tests

2020-02-09 Thread Harwath, Frederik
Hi Jakub,

On 07.02.20 16:29, Jakub Jelinek wrote:
> On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote:
>> * {target-32.c, thread-limit-2.c}:
>> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690
> 
> Please don't, I want to deal with that using declare variant, just didn't
> get yet around to finishing the last patch needed for that.  Will try next 
> week.

Ok, great! looking forward to see a better solution.

>> * target-{33,34}.c:
>> no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. 
>> https://gcc.gnu.org/PR81688
>>
>> * target-link-1.c:
>> omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689
> 
> I guess this is ok, though of course the right thing would be to implement
> both
Ok, this means that I can commit the attached patch which contains only the 
changes to
target-{33,43}.c and target-link-1.c? Of course, I agree that those features 
should be
implemented.

> There has been even in some PR a suggestion that instead of failing
> in nvptx async_run we should just ignore the nowait clause if the plugin
> doesn't implement it properly.

This must be https://gcc.gnu.org/PR93481.

Best regards,
Frederik


From e5165ccb143022614920dbd208f6f368b84b4382 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 10 Feb 2020 08:08:00 +0100
Subject: [PATCH] Add xfails to libgomp tests target-{33,34}.c, target-link-1.c

Add xfails for nvptx offloading because
"no GOMP_OFFLOAD_async_run implemented in plugin-nvptx.c"
(https://gcc.gnu.org/PR81688) and because
"omp target link not implemented for nvptx"
(https://gcc.gnu.org/PR81689).

libgomp/
	* testsuite/libgomp.c/target-33.c: Add xfail for execution on
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81688.
	* testsuite/libgomp.c/target-34.c: Likewise.
	* testsuite/libgomp.c/target-link-1.c: Add xfail for
	offload_target_nvptx, cf. https://gcc.gnu.org/PR81689.
---
 libgomp/testsuite/libgomp.c/target-33.c | 3 +++
 libgomp/testsuite/libgomp.c/target-34.c | 3 +++
 libgomp/testsuite/libgomp.c/target-link-1.c | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/target-33.c b/libgomp/testsuite/libgomp.c/target-33.c
index 1bed4b6bc67..15d2d7e38ab 100644
--- a/libgomp/testsuite/libgomp.c/target-33.c
+++ b/libgomp/testsuite/libgomp.c/target-33.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81688.  */
+
 extern void abort (void);
 
 int
diff --git a/libgomp/testsuite/libgomp.c/target-34.c b/libgomp/testsuite/libgomp.c/target-34.c
index 66d9f54202b..5a3596424d8 100644
--- a/libgomp/testsuite/libgomp.c/target-34.c
+++ b/libgomp/testsuite/libgomp.c/target-34.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81688.  */
+
 extern void abort (void);
 
 int
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
index 681677cc2aa..99ce33bc9b4 100644
--- a/libgomp/testsuite/libgomp.c/target-link-1.c
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -1,3 +1,6 @@
+/* { dg-xfail-if "#pragma omp target link not implemented" { offload_target_nvptx } }
+   Cf. https://gcc.gnu.org/PR81689.  */
+
 struct S { int s, t; };
 
 int a = 1, b = 1;
-- 
2.17.1



[PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]

2020-02-13 Thread Harwath, Frederik
Hi Jakub,

On 10.02.20 08:49, Harwath, Frederik wrote:

>> There has been even in some PR a suggestion that instead of failing
>> in nvptx async_run we should just ignore the nowait clause if the plugin
>> doesn't implement it properly.
> 
> This must be https://gcc.gnu.org/PR93481.

The attached patch implements the behavior that has been suggested in the PR.
It makes GOMP_OFFLOAD_async_run optional, removes the stub which produces
the error described in the PR from the nvptx plugin, and changes the 
nowait-handling
to ignore the clause if GOMP_OFFLOAD_async_run is not available for the 
executing
device's plugin. I have tested the patch by running the full libgomp testsuite 
with
nvptx-none offloading on x86_64-linux-gnu. I have observed no regressions.

Ok to push the commit to master?

For the record: Someone should implement GOMP_OFFLOAD_async_run properly
in the nvtpx plugin.

Best regards,
Frederik

From 1258f713be317870e9171281e3f7c3a174773aa1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 13 Feb 2020 07:30:16 +0100
Subject: [PATCH] openmp: ignore nowait if async execution is unsupported
 [PR93481]

An OpenMP "nowait" clause on a target construct currently leads to
a call to GOMP_OFFLOAD_async_run in the plugin that is used for
offloading at execution time. The nvptx plugin contains only a stub
of this function that always produces a fatal error if called.

This commit changes the "nowait" implementation to ignore the clause
if the executing device's plugin does not implement GOMP_OFFLOAD_async_run.
The stub in the nvptx plugin is removed which effectively means that
programs containing "nowait" can now be executed with nvptx offloading
as if the clause had not been used.
This behavior is consistent with the OpenMP specification which says that
"[...] execution of the target task *may* be deferred" (emphasis added),
cf. OpenMP 5.0, page 172.

libgomp/

	* plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub.
	* target.c (gomp_load_plugin_for_device): Make "async_run" loading
	optional.
	(gomp_target_task_fn): Assert "devicep->async_run_func".
	(clear_unsupported_flags): New function to remove unsupported flags
	(right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored.
	(GOMP_target_ext): Apply clear_unsupported_flags to flags.
	(GOMP_target_update_ext): Likewise.
	(GOMP_target_enter_exit_data): Likewise.
	* testsuite/libgomp.c/target-33.c:
	Remove xfail for offload_target_nvptx.
	* testsuite/libgomp.c/target-34.c: Likewise.
---
 libgomp/plugin/plugin-nvptx.c   |  7 +--
 libgomp/target.c| 19 ++-
 libgomp/testsuite/libgomp.c/target-33.c |  3 ---
 libgomp/testsuite/libgomp.c/target-34.c |  3 ---
 4 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 6033c71a9db..ec103a2f40b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
   nvptx_stacks_free (stacks, teams * threads);
 }
 
-void
-GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args,
-			void *async_data)
-{
-  GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented");
-}
+/* TODO: Implement GOMP_OFFLOAD_async_run. */
diff --git a/libgomp/target.c b/libgomp/target.c
index 3df007283f4..4fbf963f305 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+static unsigned int
+clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
+{
+  /* If we cannot run asynchronously, simply ignore nowait.  */
+  if (devicep != NULL && devicep->async_run_func == NULL)
+flags &= ~GOMP_TARGET_FLAG_NOWAIT;
+
+  return flags;
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
and several arguments have been added:
FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size = 0;
   bool fpc_done = false;
 
+  flags = clear_unsupported_flags (devicep, flags);
+
   if (flags & GOMP_TARGET_FLAG_NOWAIT)
 {
   struct gomp_thread *thr = gomp_thread ();
@@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
+  flags = clear_unsupported_flags (devicep, flags);
+
   /* If there are depend clauses, but nowait is not present,
  block the parent task until the dependencies are resolved
  and then just continue with the rest of the function as if it
@@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void *

Re: [PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]

2020-02-13 Thread Harwath, Frederik
Hi Jakub,

On 13.02.20 09:30, Jakub Jelinek wrote:
> On Thu, Feb 13, 2020 at 09:04:36AM +0100, Harwath, Frederik wrote:
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const 
>> void *unused,
>>gomp_unmap_vars (tgt_vars, true);
>>  }
>>  
>> +static unsigned int
> 
> Add inline?
> 

Added.

>> @@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, 
>> void **hostaddrs,
>>  {
>>struct gomp_device_descr *devicep = resolve_device (device);
>>  
>> +  flags = clear_unsupported_flags (devicep, flags);

>> @@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t 
>> mapnum, void **hostaddrs,
>>  {
>>struct gomp_device_descr *devicep = resolve_device (device);
>>  
>> +  flags = clear_unsupported_flags (devicep, flags);

> I don't see why you need to do the above two.  GOMP_TARGET_TASK_DATA
> is done on the host side, async_run callback isn't called in that case
> and while we create a task, all we do is wait for the (host) dependencies
> in there and then perform the data transfer we need.
> I think it is perfectly fine to ignore nowait on target but honor it
> on target update or target {enter,exit} data.

I see. Removed.


> Otherwise LGTM.

Thanks for the review! I have committed the patch with those changes. I forgot 
to include the ChangeLog
entry which I had to add in a separate commit. Sorry for that! It seems that I 
have to adapt my workflow -
perhaps some pre-push hook ;-).

Best regards,
Frederik

From 001ab12e620c6f117b2e93c77d188bd62fe7ba03 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 13 Feb 2020 07:30:16 +0100
Subject: [PATCH 1/2] openmp: ignore nowait if async execution is unsupported
 [PR93481]

An OpenMP "nowait" clause on a target construct currently leads to
a call to GOMP_OFFLOAD_async_run in the plugin that is used for
offloading at execution time. The nvptx plugin contains only a stub
of this function that always produces a fatal error if called.

This commit changes the "nowait" implementation to ignore the clause
if the executing device's plugin does not implement GOMP_OFFLOAD_async_run.
The stub in the nvptx plugin is removed which effectively means that
programs containing "nowait" can now be executed with nvptx offloading
as if the clause had not been used.
This behavior is consistent with the OpenMP specification which says that
"[...] execution of the target task *may* be deferred" (emphasis added),
cf. OpenMP 5.0, page 172.

libgomp/

	* plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub.
	* target.c (gomp_load_plugin_for_device): Make "async_run" loading
	optional.
	(gomp_target_task_fn): Assert "devicep->async_run_func".
	(clear_unsupported_flags): New function to remove unsupported flags
	(right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored.
	(GOMP_target_ext): Apply clear_unsupported_flags to flags.
	* testsuite/libgomp.c/target-33.c:
	Remove xfail for offload_target_nvptx.
	* testsuite/libgomp.c/target-34.c: Likewise.
---
 libgomp/plugin/plugin-nvptx.c   |  7 +--
 libgomp/target.c| 15 ++-
 libgomp/testsuite/libgomp.c/target-33.c |  3 ---
 libgomp/testsuite/libgomp.c/target-34.c |  3 ---
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 6033c71a9db..ec103a2f40b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args)
   nvptx_stacks_free (stacks, teams * threads);
 }
 
-void
-GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args,
-			void *async_data)
-{
-  GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented");
-}
+/* TODO: Implement GOMP_OFFLOAD_async_run. */
diff --git a/libgomp/target.c b/libgomp/target.c
index 3df007283f4..0ff727de47d 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
   gomp_unmap_vars (tgt_vars, true);
 }
 
+static inline unsigned int
+clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags)
+{
+  /* If we cannot run asynchronously, simply ignore nowait.  */
+  if (devicep != NULL && devicep->async_run_func == NULL)
+flags &= ~GOMP_TARGET_FLAG_NOWAIT;
+
+  return flags;
+}
+
 /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present,
and several arguments have been added:
FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h.
@@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
   size_t tgt_align = 0, tgt_size

Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic

2019-12-04 Thread Harwath, Frederik
Hi Tobias,

On 04.12.19 14:37, Tobias Burnus wrote:
> As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
> == 0 to the middle end. The reason for that is how parsing works – gfortran 
> reads the input line by line.
> 
> For internal error diagnostic (fortran/error.c), the column location was 
> corrected –  but not for locations passed to the middle end. Hence, the 
> diagnostic there wasn't optimal.

I am not sure if those changes have any impact on existing diagnostics - 
probably not or you would have needed to change some tests in your patch. Thus, 
I want to confirm that this fixes the
problems that I had when trying to emit warnings that referenced the location 
of OpenACC reduction clauses from pass_lower_omp when compiling Fortran code.
Where previously

inform (OMP_CLAUSE_LOCATION (some_omp_clause), "Some message.");

would produce

[...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:0: note: 
Some message.

I now get the expected result:

[...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:27: note: 
Some message.

(Well, not completely as expected. In this case where the clause is an OpenACC 
reduction clause, the location of the clause is a bit off because it points to 
the reduction variable and not
to the beginning of the clause, but that's another issue which is not related 
to this patch ;-) )

The existing translation of the reduction clauses has another bug. It uses the 
location of the first clause from the reduction list for all clauses. This 
could be fixed by changing the patch as follows:

> @@ -1854,7 +1854,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist 
> *namelist, tree list,
>   tree t = gfc_trans_omp_variable (namelist->sym, false);
>   if (t != error_mark_node)
> {
> - tree node = build_omp_clause (where.lb->location,
> + tree node = build_omp_clause (gfc_get_location (&where),
> OMP_CLAUSE_REDUCTION);
>   OMP_CLAUSE_DECL (node) = t;
>   if (mark_addressable)

Here "&where" should be "&namelist->where" to use the location of the current 
clause. I have verified that this yields the correct locations for all clauses 
using the nested-reductions-warn.f90 test.


Thank you for fixing this!

Best regards,
Frederik



[PATCH][AMDGCN] Skip test gcc/testsuite/gcc.dg/asm-4.c

2019-12-04 Thread Harwath, Frederik
Hi,
the inline assembly "p" modifier ("An operand that is a valid memory address is 
allowed",
cf. 
https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints)
is not supported on AMD GCN. This causes an ICE during the compilation of 
gcc.dg/asm-4.c.
We should skip the test for the amdgcn-*-* target.

Can I merge the patch below into trunk?

Best regards,
Frederik


2019-12-05  Frederik Harwath  

gcc/testsuite/
* gcc.dg/asm-4.c: Skip on target amdgcn-*-*.

Index: gcc/testsuite/gcc.dg/asm-4.c
===
--- gcc/testsuite/gcc.dg/asm-4.c(revision 278932)
+++ gcc/testsuite/gcc.dg/asm-4.c(working copy)
@@ -3,6 +3,7 @@

 /* "p" modifier can't be used to generate a valid memory address with ILP32.  
*/
 /* { dg-skip-if "" { aarch64*-*-* && ilp32 } } */
+/* { dg-skip-if "'p' is not supported for GCN" { amdgcn-*-* } } */

 int main()
 {



[PATCH] Fix column information for omp_clauses in Fortran code

2019-12-09 Thread Harwath, Frederik
Hi,
Tobias has recently fixed a problem with the column information in gfortran 
locations
("PR 92793 - fix column used for error diagnostic"). Diagnostic messages for 
OpenMP/OpenACC
clauses do not contain the right column information yet. The reason is that the 
location
information of the first clause is used for all clauses on a line and hence the 
columns
are wrong for all but the first clause. The attached patch fixes this problem.

I have tested the patch manually by adapting the validity check for nested 
OpenACC reductions (see omp-low.c)
to include the location of clauses in warnings instead of the location of the 
loop to which the clause belongs.
I can add a regression test based on this later on after adapting the code in 
omp-low.c.

Is it ok to include the patch in trunk?

Best regards,
Frederik


On 04.12.19 14:37, Tobias Burnus wrote:
> As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN 
> == 0 to the middle end. The reason for that is how parsing works – gfortran 
> reads the input line by line.
> 
> For internal error diagnostic (fortran/error.c), the column location was 
> corrected –  but not for locations passed to the middle end. Hence, the 
> diagnostic there wasn't optimal.
> 
> Fixed by introducing a new function; now one only needs to make sure that no 
> new code will re-introduce "lb->location" :-)
> 
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk?
> 
> Tobias

From af3a63b64f38d522b0091a123a919d1f20f5a8b1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 9 Dec 2019 15:07:53 +0100
Subject: [PATCH] Fix column information for omp_clauses in Fortran code

The location of all OpenMP/OpenACC clauses on any given line in Fortran code
always points to the first clause on that line. Hence, the column information
is wrong for all clauses but the first one.

Use the correct location for each clause instead.

2019-12-09  Frederik Harwath  

/gcc/fortran/
	* trans-openmp (gfc_trans_omp_reduction_list): Pass correct location for each
	clause to build_omp_clause.
---
 gcc/fortran/trans-openmp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d07ff86fc0b..356fd04e6c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1982,7 +1982,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list,
 	tree t = gfc_trans_omp_variable (namelist->sym, false);
 	if (t != error_mark_node)
 	  {
-	tree node = build_omp_clause (gfc_get_location (&where),
+	tree node = build_omp_clause (gfc_get_location (&namelist->where),
 	  OMP_CLAUSE_REDUCTION);
 	OMP_CLAUSE_DECL (node) = t;
 	if (mark_addressable)
-- 
2.17.1



[PATCH 1/2] Use clause locations in OpenACC nested reduction warnings

2019-12-10 Thread Frederik Harwath
Since the Fortran front-end now sets the clause locations correctly, we can
emit warnings with more precise locations if we encounter conflicting
operations for a variable in reduction clauses.

2019-12-10  Frederik Harwath  

gcc/
* omp-low.c (scan_omp_for): Use clause location in warning.
---
 gcc/omp-low.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ad26f7918a5..d422c205836 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2473,7 +2473,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
  tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause);
  if (outer_var == local_var && outer_op != local_op)
{
- warning_at (gimple_location (stmt), 0,
+ warning_at (OMP_CLAUSE_LOCATION (local_clause), 0,
  "conflicting reduction operations for %qE",
  local_var);
  inform (OMP_CLAUSE_LOCATION (outer_clause),
-- 
2.17.1



[PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Frederik Harwath
Hi,

On 09.12.19 16:58, Harwath, Frederik wrote:
> Tobias has recently fixed a problem with the column information in gfortran 
> locations
> [...]
> I have tested the patch manually by adapting the validity check for nested 
> OpenACC reductions (see omp-low.c)
> to include the location of clauses in warnings instead of the location of the 
> loop to which the clause belongs.
> I can add a regression test based on this later on after adapting the code in 
> omp-low.c.

here are patches adding the promised test for Fortran and a corresponding test 
for C.

Is it ok to include them in trunk?

Best regards,
Frederik

Frederik Harwath (2):
  Use clause locations in OpenACC nested reduction warnings
  Add tests to verify OpenACC clause locations

 gcc/omp-low.c  |  2 +-
 gcc/testsuite/gcc.dg/goacc/clause-locations.c  | 17 +
 .../gfortran.dg/goacc/clause-locations.f90 | 18 ++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90

-- 
2.17.1



[PATCH 2/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Frederik Harwath
Check that the column information for OpenACC clauses is communicated correctly
to the middle-end, in particular by the Fortran front-end (cf. PR 92793).

2019-12-10  Frederik Harwath  

gcc/testsuite/
* gcc.dg/goacc/clause-locations.c: New test.
* gfortran.dg/goacc/clause-locations.f90: New test.
---
 gcc/testsuite/gcc.dg/goacc/clause-locations.c  | 17 +
 .../gfortran.dg/goacc/clause-locations.f90 | 18 ++
 2 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90

diff --git a/gcc/testsuite/gcc.dg/goacc/clause-locations.c 
b/gcc/testsuite/gcc.dg/goacc/clause-locations.c
new file mode 100644
index 000..51184e3517b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/goacc/clause-locations.c
@@ -0,0 +1,17 @@
+/* Verify that the location information for clauses is correct. */
+
+void
+check_clause_columns() {
+  int i, j, sum, diff;
+
+  #pragma acc parallel
+  {
+#pragma acc loop reduction(+:sum)
+for (i = 1; i <= 10; i++)
+  {
+#pragma acc loop reduction(-:diff) reduction(-:sum)  /* { dg-warning 
"53: conflicting reduction operations for .sum." } */
+   for (j = 1; j <= 10; j++)
+ sum = 1;
+  }
+  }
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 
b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90
new file mode 100644
index 000..29798d31542
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90
@@ -0,0 +1,18 @@
+! Verify that the location information for clauses is correct.
+! See also PR 92793.
+
+subroutine check_clause_columns ()
+  implicit none (type, external)
+  integer :: i, j, sum, diff
+
+  !$acc parallel
+!$acc loop reduction(+:sum)
+do i = 1, 10
+  !$acc loop reduction(-:diff) reduction(-:sum)  ! { dg-warning "47: 
conflicting reduction operations for .sum." }
+  do j = 1, 10
+sum = 1
+  end do
+end do
+  !$acc end parallel
+end subroutine check_clause_columns
+
-- 
2.17.1



Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

>> Frederik Harwath (2):
>>   Use clause locations in OpenACC nested reduction warnings
>>   Add tests to verify OpenACC clause locations
> 
> I won't insist, but suggest (common practice) to merge that into one
> patch: bug fix plus test cases, using the summary line of your first
> patch.> [...]
> It's of course always OK to add new test cases, but wouldn't the same
> test coverage be reached by just adding such checking to the existing
> test cases in 'c-c++-common/goacc/nested-reductions-warn.c',
> 'gfortran.dg/goacc/nested-reductions-warn.f90'?

Sure, we could have everything in one patch and one test. The rationale
for splitting the patches and for splitting the tests is that the tests do
not try to verify the nested reductions validation code. They try to verify
that the language front-ends set the correct locations for clauses.
Without a possibility to do proper unit testing, I just had to find some
way to check the clauses. I had no immediate success triggering one of the
very few other warnings that use the location of omp_clauses from both Fortran
and C code and hence I went with the nested reductions code.

Thanks for your review!

Best regards,
Frederik




Re: [PATCH 0/2] Add tests to verify OpenACC clause locations

2019-12-10 Thread Harwath, Frederik
Hi Thomas,

On 10.12.19 15:44, Thomas Schwinge wrote:

> Thanks, yes, with my following remarks considered, and acted on per your
> preference.  To record the review effort, please include "Reviewed-by:
> Thomas Schwinge " in the commit log, see
> <https://gcc.gnu.org/wiki/Reviewed-by>.

Committed as r279168 and r279169.

Frederik




[PATCH, committed] Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c

2019-12-11 Thread Harwath, Frederik
Hi,
I have committed the attached trivial patch to trunk as r279215. The columns of 
the clause locations are reported differently
by the C and C++ front-end and hence we need different test expectations for 
both languages.

Best regards,
Frederik

r279215 | frederik | 2019-12-11 09:26:18 +0100 (Mi, 11 Dez 2019) | 12 lines

Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c 

The columns of the clause locations that are reported for C and C++ are
different and hence we need separate test expectations for both languages.

2019-12-11  Frederik Harwath  

	PR other/92901
	/gcc/testsuite/
	* c-c++-common/clause-locations.c: Adjust test expectation for C++.




Index: gcc/testsuite/c-c++-common/goacc/clause-locations.c
===
--- gcc/testsuite/c-c++-common/goacc/clause-locations.c	(revision 279214)
+++ gcc/testsuite/c-c++-common/goacc/clause-locations.c	(working copy)
@@ -9,7 +9,9 @@
 #pragma acc loop reduction(+:sum)
 for (i = 1; i <= 10; i++)
   {
-#pragma acc loop reduction(-:diff) reduction(-:sum)  /* { dg-warning "53: conflicting reduction operations for .sum." } */
+#pragma acc loop reduction(-:diff) reduction(-:sum)
+	/* { dg-warning "53: conflicting reduction operations for .sum." "" { target c } .-1 } */
+	/* { dg-warning "56: conflicting reduction operations for .sum." "" { target c++ } .-2 } */
 	for (j = 1; j <= 10; j++)
 	  sum = 1;
   }


Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-20 Thread Harwath, Frederik
Hi Thomas,
thanks for the review! I have attached a revised patch.

> > There is no AMD GCN support yet. This will be added later on.
>
> ACK, just to note that there now is a 'libgomp/plugin/plugin-gcn.c' that
> at least needs to get a stub implementation (can mostly copy from
> 'libgomp/plugin/plugin-hsa.c'?) as otherwise the build will fail.

Yes, I have added a stub. A full implementation will follow soon.
The implementation in the OG9 branch that Andrew mentioned will need a
bit of polishing.

> Tobias has generally reviewed the Fortran bits, correct?

Yes, he has done that internally.

> | Before Frederik starts working on integrating this into GCC trunk, do you
> | (Jakub) agree with the libgomp plugin interface changes as implemented by
> | Maciej?  For example, top-level 'GOMP_OFFLOAD_get_property' function in
> | 'struct gomp_device_descr' instead of stuffing this into its
> | 'acc_dispatch_t openacc'.  (I never understood why the OpenACC functions
> | need to be segregated like they are.)
>
> Jakub didn't answer, but I now myself decided that we should group this
> with the other OpenACC libgomp-plugin functions, as this interface is
> defined in terms of OpenACC-specific stuff such as 'acc_device_t'.
> Frederik, please work on that, also try to move function definitions etc.
> into appropriate places in case they aren't; ask if you need help.
> That needs to be updated.

Is it ok to do this in a separate follow-up patch?


> >  .../acc-get-property-2.c  |  68 +
> >  .../acc-get-property-3.c  |  19 +++
> >  .../acc-get-property-aux.c|  60 
> >  .../acc-get-property.c|  75 ++
> >  .../libgomp.oacc-fortran/acc-get-property.f90 |  80 ++
>
> Please name all these 'acc_get_property*', which is the name of the
> interface tested.

Ok.


> > --- a/include/gomp-constants.h
> > +++ b/include/gomp-constants.h
> > @@ -178,6 +178,20 @@ enum gomp_map_kind
> >=20=20
> >  #define GOMP_DEVICE_ICV-1
> >  #define GOMP_DEVICE_HOST_FALLBACK  -2
> > +#define GOMP_DEVICE_CURRENT-3
> [...]
>
> Not should if this should be grouped with 'GOMP_DEVICE_ICV',
> 'GOMP_DEVICE_HOST_FALLBACK', for it is not related to there.
>
> [...]
>
> Should this actually get value '-1' instead of '-3'?  Or, is the OpenACC
> 'acc_device_t' code already paying special attention to negative values
> '-1', '-2'?  (I don't think so.)
> | Also, 'acc_device_current' is a libgomp-internal thing (doesn't interface
> | with the compiler proper), so strictly speaking 'GOMP_DEVICE_CURRENT'
> | isn't needed in 'include/gomp-constants.h'.  But probably still a good
> | idea to list it there, in this canonical place, to keep the several lists
> | of device types coherent.
> still wonder about that...  ;-)

I have removed GOMP_DEVICE_CURRENT from gomp-constants.h.
Changing the value of GOMP_DEVICE_ICV violates the following static asserts in 
oacc-parallel.c:

 /* In the ABI, the GOACC_FLAGs are encoded as an inverted bitmask, so that we
   continue to support the following two legacy values.  */
_Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_ICV) == 0,
"legacy GOMP_DEVICE_ICV broken");
_Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_HOST_FALLBACK)
== GOACC_FLAG_HOST_FALLBACK,
"legacy GOMP_DEVICE_HOST_FALLBACK broken");

> > +/* Device property codes.  Keep in sync with
> > +   libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_device_property_t
>
> | Same thing, libgomp-internal, not sure whether to list these here?
>
> > +   as well as libgomp/libgomp-plugin.h.  */
>
> (Not sure why 'libgomp/libgomp-plugin.h' is relevant here?)

It does not seem to be relevant. Right now, openacc_lib.h is also not relevant.
I have removed both file names from the comment.

> > +#define GOMP_DEVICE_PROPERTY_MEMORY1
> > +#define GOMP_DEVICE_PROPERTY_FREE_MEMORY   2
> > +#define GOMP_DEVICE_PROPERTY_NAME  0x10001
> > +#define GOMP_DEVICE_PROPERTY_VENDOR0x10002
> > +#define GOMP_DEVICE_PROPERTY_DRIVER0x10003
> > +
> > +/* Internal property mask to tell numeric and string values apart.  */
> > +#define GOMP_DEVICE_PROPERTY_STRING_MASK   0x1
>
> (Maybe should use an 'enum'?)

I have changed this to an enum. However, this does not improve the code much,
since we cannot use the enum for the function argume

Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-12-22 Thread Harwath, Frederik
Hi Thomas,

>> Is it ok to commit the patch to trunk?
> 
> OK, thanks.  And then some follow-up/clean-up next year, also including
> some of the open questions that I've snipped off here.

Right, thanks for the review! I have committed the patch as r279710 with a
minor change: I have disabled the new acc_get_property.{c,f90} tests for
the amdgcn offload target for now.

Best regards,
Frederik



*ping* - Re: [Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)

2020-01-08 Thread Harwath, Frederik
PING

Hi Jakub,
I have attached a version of the patch that has been rebased on the current 
trunk.

Frederik

On 03.12.19 12:16, Harwath, Frederik wrote:
> On 08.11.19 07:41, Harwath, Frederik wrote:
>> On 06.11.19 14:00, Jakub Jelinek wrote:
>> [...]
>>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
>>> more natural, wouldn't it.
>> [...]
>>> If gimplifier is not the right spot, then use a splay tree + vector instead?
>>> splay tree for the outer ones, vector for the local ones, and put into both
>>> the clauses, so you can compare reduction code etc.
>>
>> Sounds like a good idea. I am going to try that.
> 
> Below you can find a patch that reimplements the nested reductions check using
> more appropriate data structures. [...]


From b08855328c52e36143770e442e50ba87f25c14b3 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 8 Jan 2020 14:00:44 +0100
Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking

Revision 277875 of trunk introduced a consistency check for nested OpenACC
reduction clauses. The implementation has two drawbacks:
1) It uses suboptimal data structures for storing information about
   the reduction clauses.
2) The warnings issued for *repeated* inconsistent use of reduction operators
   are confusing. For instance, on three nested loops that use the reduction
   operators +, -, + on the same variable, we obtain a warning at the switch
   from + to - (as desired) and another warning about the switch from - to +.
   It would be preferable to avoid the second warning since + is consistent
   with the first reduction operator.

This commit attempts to fix both problems by using more appropriate data
structures (splay trees and vectors instead of tree lists) for keeping track of
the information about the reduction clauses.

2020-01-08  Frederik Harwath  

	gcc/
	* omp-low.c (omp_context): Removed fields local_reduction_clauses,
	outer_reduction_clauses; added fields oacc_reduction_clauses,
	oacc_reductions_stack.
	(oacc_reduction_clause_location): New struct.
	(oacc_reduction_var_occ): New struct.
	(new_omp_context): Adjust omp_context initialization to new fields.
	(delete_omp_context): Adjust omp_context deletion to new fields.
	(rewind_oacc_reductions_stack): New function.
	(check_oacc_reduction_clause): New function.
	(check_oacc_reduction_clauses): New function.
	(scan_sharing_clauses): Call check_oacc_reduction_clause for
	reduction clauses (this handles clauses on compute regions)
	if a new optional flag is enabled.
	(scan_omp_for): Remove old nested reduction check, call
	 check_oacc_reduction_clauses instead.
	(scan_omp_target): Adapt call to scan_sharing_clauses to enable the new
	flag.

   	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to
	 ignore warnings that are not relevant to the test.
	(acc_parallel): Stop expecting pruned warnings, adjust expected
	warnings to changes in omp-low.c, add checks for info messages about the
	location of clauses.
	(acc_parallel_loop): Likewise.
	(acc_parallel_reduction): Likewise.
	(acc_parallel_loop_reduction): Likewise.
	(acc_routine): Likewise.
	(acc_kernels): Likewise.

	* gfortran.dg/goacc/nested-reductions-warn.f90: Likewise.
---
 gcc/omp-low.c | 306 --
 .../goacc/nested-reductions-warn.c|  81 ++---
 .../goacc/nested-reductions-warn.f90  |  83 ++---
 3 files changed, 271 insertions(+), 199 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index e692a53a3de..6026b7aff89 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.  If not see
scanned for regions which are then moved to a new
function, to be invoked by the thread library, or offloaded.  */
 
+
+struct oacc_reduction_var_occ;
+
 /* Context structure.  Used to store information about each parallel
directive in the code.  */
 
@@ -128,12 +131,6 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
-  tree local_reduction_clauses;
-
-  /* A tree_list of the reduction clauses in outer contexts.  */
-  tree outer_reduction_clauses;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -163,8 +160,52 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* A mapping that maps a variable to information about the last OpenACC
+ reduction clause that used the variable above the current context.
+ This information is used for checking the nesting restrictions for
+ reduction clauses by the function 

Re: [PATCH] OpenMP: warn about iteration var modifications in loop body

2024-03-06 Thread Frederik Harwath

Ping.


The Linaro CI has kindly pointed me to two test regressions that I had
missed. I have adjust the test expectations in the updated patch which I
have attached.

Frederik


On 28.02.24 8:32 PM, Frederik Harwath wrote:

Hi,

this patch implements a warning about (some simple cases of direct)
modifications of iteration variables in OpenMP loops which are
forbidden according to the OpenMP specification. I think this can be
helpful, especially for new OpenMP users. I have implemented this
after I observed some confusion concerning this topic recently.
The check is implemented during gimplification. It reuses the
"loop_iter_var" vector in the "gimplify_omp_ctx" which was previously
only used for "doacross" handling to identify the loop iteration
variables during the gimplification of MODIFY_EXPRs in omp_for bodies.
I have only added a common C/C++ test because I don't see any special
C++ constructs for which a warning *should* be emitted and Fortran
rejects modifications of iteration variables in do loops in general.

I have run "make check" on x86_64-linux-gnu and not observed any
regressions.

Is it ok to commit this?

Best regards,
Frederik
From d4fb1710bfa1d5b66979db1f0aea2d5c68ab2264 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 27 Feb 2024 21:07:00 +
Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

OpenMP loop iteration variables may not be changed by user code in the
loop body according to the OpenMP specification.  In general, the
compiler cannot enforce this, but nevertheless simple cases in which
the user modifies the iteration variable directly in the loop body
(in contrast to, e.g., modifications through a pointer) can be recognized. A
warning should be useful, for instance, to new users of OpenMP.

This commit implements a warning about forbidden iteration var modifications
during gimplification. It reuses the "loop_iter_var" vector in the
"gimplify_omp_ctx" which was previously only used for "doacross" handling to
identify the loop iteration variables during the gimplification of MODIFY_EXPRs
in omp_for bodies.

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to
	recognize the gimplification state during which the new warning should
	be emitted. Add field "is_doacross" to distinguish the original use of
	"loop_iter_var" from its new use.
	(new_omp_context): Initialize new gimplify_omp_ctx fields.
	(gimplify_modify_expr): Emit warning if iter var is modified.
	(gimplify_omp_for): Make initialization and filling of loop_iter_var
	vector unconditional and adjust new gimplify_omp_ctx fields before
	gimplifying the omp_for body.
	(gimplify_omp_ordered): Check for do_across field in addition to
	emptiness check on loop_iter_var vector since the vector is now always
	being filled.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/pr92347.c: Adjust.
	* gcc.target/aarch64/sve/pr96195.c: Adjust.
	* c-c++-common/gomp/iter-var-modification.c: New test.

Signed-off-by: Frederik Harwath  
---
 gcc/gimplify.cc   |  54 +++---
 .../c-c++-common/gomp/iter-var-modification.c | 100 ++
 gcc/testsuite/gcc.dg/vect/pr92347.c   |   2 +-
 .../gcc.target/aarch64/sve/pr96195.c  |   2 +-
 4 files changed, 140 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..a74ad987cf7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -235,6 +235,8 @@ struct gimplify_omp_ctx
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool in_omp_for_body;
+  bool is_doacross;
   int defaultmap[5];
 };
 
@@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->loop_iter_var.create (0);
+  c->in_omp_for_body = false;
+  c->is_doacross = false;
+
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
 
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
+{
+  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
+  for (size_t i = 0; i < num_vars; i++)
+	{
+	  if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
+	warning_at (input_location, OPT_Wopenmp,
+			"forbidden modification of iteration variable %qE in "
+			"OpenMP loop", *to_p);
+	}
+}
+
   /* Trying to simplify a clobber using normal logic doesn't work,
  so handle it here.  */
   if (TREE_CLOBBER_P (*from_p))
@@ -15334

[PATCH] OpenMP: warn about iteration var modifications in loop body

2024-02-28 Thread Frederik Harwath

Hi,

this patch implements a warning about (some simple cases of direct)
modifications of iteration variables in OpenMP loops which are forbidden
according to the OpenMP specification. I think this can be helpful,
especially for new OpenMP users. I have implemented this after I
observed some confusion concerning this topic recently.
The check is implemented during gimplification. It reuses the
"loop_iter_var" vector in the "gimplify_omp_ctx" which was previously
only used for "doacross" handling to identify the loop iteration
variables during the gimplification of MODIFY_EXPRs in omp_for bodies.
I have only added a common C/C++ test because I don't see any special
C++ constructs for which a warning *should* be emitted and Fortran
rejects modifications of iteration variables in do loops in general.

I have run "make check" on x86_64-linux-gnu and not observed any
regressions.

Is it ok to commit this?

Best regards,
Frederik
From 4944a9f94bcda9907e0118e71137ee7e192657c2 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 27 Feb 2024 21:07:00 +
Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body

OpenMP loop iteration variables may not be changed by user code in the
loop body according to the OpenMP specification.  In general, the
compiler cannot enforce this, but nevertheless simple cases in which
the user modifies the iteration variable directly in the loop body
(in contrast to, e.g., modifications through a pointer) can be recognized. A
warning should be useful, for instance, to new users of OpenMP.

This commit implements a warning about forbidden iteration var modifications
during gimplification. It reuses the "loop_iter_var" vector in the
"gimplify_omp_ctx" which was previously only used for "doacross" handling to
identify the loop iteration variables during the gimplification of MODIFY_EXPRs
in omp_for bodies.

gcc/ChangeLog:

	* gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to
	recognize the gimplification state during which the new warning should
	be emitted. Add field "is_doacross" to distinguish the original use of
	"loop_iter_var" from its new use.
	(new_omp_context): Initialize new gimplify_omp_ctx fields.
	(gimplify_modify_expr): Emit warning if iter var is modified.
	(gimplify_omp_for): Make initialization and filling of loop_iter_var
	vector unconditional and adjust new gimplify_omp_ctx fields before
	gimplifying the omp_for body.
	(gimplify_omp_ordered): Check for do_across field in addition to
	emptiness check on loop_iter_var vector since the vector is now always
	being filled.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/iter-var-modification.c: New test.

Signed-off-by: Frederik Harwath  
---
 gcc/gimplify.cc   |  54 +++---
 .../c-c++-common/gomp/iter-var-modification.c | 100 ++
 2 files changed, 138 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 7f79b3cc7e6..a74ad987cf7 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -235,6 +235,8 @@ struct gimplify_omp_ctx
   bool order_concurrent;
   bool has_depend;
   bool in_for_exprs;
+  bool in_omp_for_body;
+  bool is_doacross;
   int defaultmap[5];
 };
 
@@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type)
   c->privatized_types = new hash_set;
   c->location = input_location;
   c->region_type = region_type;
+  c->loop_iter_var.create (0);
+  c->in_omp_for_body = false;
+  c->is_doacross = false;
+
   if ((region_type & ORT_TASK) == 0)
 c->default_kind = OMP_CLAUSE_DEFAULT_SHARED;
   else
@@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
   gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR
 	  || TREE_CODE (*expr_p) == INIT_EXPR);
 
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body)
+{
+  size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2;
+  for (size_t i = 0; i < num_vars; i++)
+	{
+	  if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1])
+	warning_at (input_location, OPT_Wopenmp,
+			"forbidden modification of iteration variable %qE in "
+			"OpenMP loop", *to_p);
+	}
+}
+
   /* Trying to simplify a clobber using normal logic doesn't work,
  so handle it here.  */
   if (TREE_CLOBBER_P (*from_p))
@@ -15334,6 +15352,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
 	  == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt)));
   gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt))
 	  == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt)));
+  int len = TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt));
+  gimplify_omp_ctxp->loop_iter_var.create (len * 2);
 
   tree c = omp_find_clause (OMP_FOR_CLAUSES (for_stm

[PATCH] amdgcn: Add gfx90c target

2024-04-25 Thread Frederik Harwath

Hi Andrew,
this patch adds support for gfx90c GCN5 APU integrated graphics devices.
The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html)
lists those devices as unsupported by rocm-amdhsa.
As we have discussed elsewhere, I have tested the patch on an AMD Ryzen
5 5500U (also with different xnack settings) that I have and it passes
most libgomp offloading tests.
Although those APUs are very constrainted compared to dGPUs, I think
they might be interesting for learning, experimentation, and testing.


Can I commit the patch to the master branch?

Best regards,
Frederik
From 809e2a0248e6fad1e8336b4a883a729017cc62e5 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 24 Apr 2024 20:29:14 +0200
Subject: [PATCH] amdgcn: Add gfx90c target

Add support for gfx90c GCN5 APU integrated graphics devices.
The LLVM AMDGPU documentation does not list those devices as supported
by rocm-amdhsa, but it passes most libgomp offloading tests.
Although they are constrainted compared to dGPUs, they might be
interesting for learning, experimentation, and testing.

gcc/ChangeLog:

	* config.gcc: Add gfx90c.
	* config/gcn/gcn-hsa.h (NO_SRAM_ECC): Likewise.
	* config/gcn/gcn-opts.h (enum processor_type): Likewise.
	(TARGET_GFX90c): New macro.
	* config/gcn/gcn.cc (gcn_option_override): Handle gfx90c.
	(gcn_omp_device_kind_arch_isa): Likewise.
	(output_file_start): Likewise.
	* config/gcn/gcn.h: Add gfx90c.
	* config/gcn/gcn.opt: Likewise.
	* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90c): New macro.
	(get_arch): Handle gfx90c.
	(main): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c
	* config/gcn/t-omp-device: Add gfx90c.
	* doc/install.texi: Likewise.
	* doc/invoke.texi: Likewise.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (isa_hsa_name): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c.
	(isa_code): Handle gfx90c.
	(max_isa_vgprs): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c.

Signed-off-by: Frederik Harwath 
---
 gcc/config.gcc  | 4 ++--
 gcc/config/gcn/gcn-hsa.h| 2 +-
 gcc/config/gcn/gcn-opts.h   | 2 ++
 gcc/config/gcn/gcn.cc   | 8 
 gcc/config/gcn/gcn.h| 2 ++
 gcc/config/gcn/gcn.opt  | 3 +++
 gcc/config/gcn/mkoffload.cc | 9 +
 gcc/config/gcn/t-omp-device | 2 +-
 gcc/doc/install.texi| 4 ++--
 gcc/doc/invoke.texi | 3 +++
 libgomp/plugin/plugin-gcn.c | 9 +
 11 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5df3c52f8e9..1bf07b6eece 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4569,7 +4569,7 @@ case "${target}" in
 		for which in arch tune; do
 			eval "val=\$with_$which"
 			case ${val} in
-			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1036 | gfx1100 | gfx1103)
+			"" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | gfx1030 | gfx1036 | gfx1100 | gfx1103)
 # OK
 ;;
 			*)
@@ -4585,7 +4585,7 @@ case "${target}" in
 			TM_MULTILIB_CONFIG=
 			;;
 		xdefault | xyes)
-			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"`
+			TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"`
 			;;
 		*)
 			TM_MULTILIB_CONFIG="${with_multilib_list}"
diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 7d6e3141cea..4611bc55392 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -93,7 +93,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 #define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1036:;march=gfx1100:;march=gfx1103:;" \
 /* These match the defaults set in gcn.cc.  */ \
 "!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};"
-#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;"
+#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;march=gfx90c:;"
 
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
configuration.  The name of the attribute also changed.  */
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index 49099bad7e7..1091035a69a 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -25,6 +25,7 @@ enum processor_type
   PROCESSOR_VEGA20,  // gfx906
   PROCESSOR_GFX908,
   PROCESSOR_GFX90a,
+  PROCESSOR_GFX90c,
   PROCESSOR_GFX1030,
   PROCESSOR_GFX1036,
   PROCESSOR_GFX1100,
@@ -36,6 +37,7 @@ enum processor_type
 #define TARGET_VEGA20 (gcn_arch == PROCESSOR_VEGA20)
 #define TARGET_GFX908 (gcn_arch == PROCESSOR_GFX908)
 #define TARGET_GFX90a (gcn_arch == PROCESSOR_GFX90a)
+#define TARGET_GFX90c (gcn_arch == PROCESSOR_GFX90c)
 #define TARGET_GFX1030 (gcn_arch == PROCESSOR_GFX1030)
 #define TARGET_GFX1036 (gcn_arch == PROCESSOR_GFX1036)
 #define TARGET_GFX1100 (gcn_arch == PROCESSOR_GFX1

[PATCH, committed][OpenACC] Adapt libgomp acc_get_property.f90 test

2020-02-21 Thread Harwath, Frederik
Hi,
The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

This obvious patch fixes that problem. Committed as 
r10-6782-g83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a.

Best regards,
Frederik
From 83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 21 Feb 2020 15:26:02 +0100
Subject: [PATCH] Adapt libgomp acc_get_property.f90 test

The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed
the name of the type that is used for the return value of the Fortran
acc_get_property function without adapting the test acc_get_property.f90.

2020-02-21  Frederik Harwath  

	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
	integer(acc_device_property) for the type of the return value of
	acc_get_property.
---
 libgomp/ChangeLog  | 7 +++
 .../testsuite/libgomp.oacc-fortran/acc_get_property.f90| 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 3c640c7350b..bff3ae58c9a 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,10 @@
+2020-02-21  Frederik Harwath  
+
+	* testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to
+	changes from 2020-02-19, i.e. use integer(c_size_t) instead of
+	integer(acc_device_property) for the type of the return value of
+	acc_get_property.
+
 2020-02-19  Tobias Burnus  
 
 	* .gitattributes: New; whitespace handling for Fortran's openacc_lib.h.
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
index 80ae292f41f..1af7cc3b988 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90
@@ -26,13 +26,14 @@ end program test
 ! and do basic device independent validation.
 subroutine print_device_properties (device_type)
   use openacc
+  use iso_c_binding, only: c_size_t
   implicit none
 
   integer, intent(in) :: device_type
 
   integer :: device_count
   integer :: device
-  integer(acc_device_property) :: v
+  integer(c_size_t) :: v
   character*256 :: s
 
   device_count = acc_get_num_devices(device_type)
-- 
2.17.1



Re: [C/C++, OpenACC] Reject vars of different scope in acc declare (PR94120)

2020-03-12 Thread Frederik Harwath
Tobias Burnus  writes:

Hi Tobias,

> Fortran patch: https://gcc.gnu.org/pipermail/gcc-patches/current/541774.html
>
> "A declare directive must be in the same scope
>   as the declaration of any var that appears in
>   the data clauses of the directive."
>
> ("A declare directive is used […] following a variable
>declaration in C or C++".)
>
> NOTE for C++: This patch assumes that variables in a namespace
> are handled in the same way as those which are at
> global (namespace) scope; however, the OpenACC specification's
> wording currently is "In C or C++ global scope, only …".
> Hence, one can argue about this part of the patch; but as
> it fixes an ICE and is a very sensible extension – the other
> option is to reject it – I believe it is fine.
> (On the OpenACC side, this is now Issue 288.)

Sounds reasonable to me.

> +bool
> +c_check_oacc_same_scope (tree decl)
> +{
> +  struct c_binding *b = I_SYMBOL_BINDING (DECL_NAME (decl));
> +  return b != NULL && B_IN_CURRENT_SCOPE (b);
> +}

Is the function really specific to OpenACC? If not, then "_oacc"
could be dropped from its name. How about "c_check_current_scope"?

> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 24f71671469..8f09eb0d375 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> [...]
> -   if (global_bindings_p ())
> +   if (current_binding_level->kind == sk_namespace)
> [...]
> -  if (error || global_bindings_p ())
> +  if (error || current_binding_level->kind == sk_namespace)
>  return NULL_TREE;

So - just to be sure - the new namespace condition subsumes the old
"global_bindings_p" condition because the global scope is also a namespace,
right? Yes, now I see that you have a test case that demonstrates that
the declare directive still works for global variables with those changes.

> diff --git a/gcc/testsuite/g++.dg/declare-pr94120.C 
> b/gcc/testsuite/g++.dg/declare-pr94120.C
> new file mode 100644
> index 000..8515c4ff875
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/declare-pr94120.C
> @@ -0,0 +1,30 @@
> +/* { dg-do compile }  */
> +
> +/* PR middle-end/94120  */
> +
> +int b[8];
> +#pragma acc declare create (b)

Looks good to me.

Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


[PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-07 Thread Frederik Harwath

Hi,
This patch fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs and which hence are not orphaned.

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  Remember that a loop is "orphaned" if it is
not lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

The patch has been tested by running the GCC and libgomp testsuites.
The latter tests ran with offloading to nvptx although that should not
be important here unless there was some very subtle reason for
forbidding the gang reductions on kernels loops. As expect, there seems
to be no such reason, i.e. I observed no regressions with the patch.

Can I include the patch in OG10?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 7320635211fff3a773beb0de1914dbfcc317ab37 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 7 Jul 2020 10:41:21 +0200
Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan
 loop" error message

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.

2020-07-07  Frederik Harwath  

gcc/fortran/

	* openmp.c (oacc_is_parallel_or_serial): Removed function.
	(oacc_is_kernels): New function.
	(oacc_is_compute_construct): New function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.

gcc/testsuite/

	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
	verifying that the error message is not emitted for
	non-orphaned loops.

	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
---
 gcc/fortran/openmp.c  | 13 +++-
 .../c-c++-common/goacc/orphan-reductions-2.c  | 69 +++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 | 58 
 3 files changed, 137 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 28408c4c99a..83c498112a8 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5926,9 +5926,16 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_kernels (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
+}
+
+static bool
+oacc_is_compute_construct (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code)
+|| oacc_is_kernels (code);
 }
 
 static gfc_statement
@@ -6222,7 +6229,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !oacc_is_parallel_or_serial (c->code))
+  if (c == NULL || !oacc_is_compute_construct (c->code))
 	gfc_error ("gang reduction on an orphan loop at %L", &code->loc);
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
new file mode 100644
index 000..2b651fd2b9f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
@@ -0,0 +1,69 @@
+/* Verify that the error message for gang reduction on orphaned OpenACC loops
+   is not reported for non-orphaned loops. */
+
+#include 
+
+int
+kernels (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc kernels
+  {
+#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+  }
+

Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-07 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

> (CC  added, for everything touching gfortran.)

Thanks!

> On 2020-07-07T10:52:08+0200, Frederik Harwath  
> wrote:
>> This patch fixes the check for reductions on orphaned gang loops
>
> This is the "Make OpenACC orphan gang reductions errors" functionality
> originally added in gomp-4_0-branch r247461.
>
>> the Fortran frontend which (in contrast to the C, C++ frontends)
>> erroneously rejects reductions on gang loops that are contained in
>> "kernels" constructs and which hence are not orphaned.
>>
>> According to the OpenACC standard version 2.5 and later, reductions on
>> orphaned gang loops are explicitly disallowed (cf.  section "Changes
>> from Version 2.0 to 2.5").  Remember that a loop is "orphaned" if it is
>> not lexically contained in a compute construct (cf. section "Loop
>> construct" of the OpenACC standard), i.e. in either a "parallel", a
>> "serial", or a "kernels" construct.
>
> Or the other way round: a 'loop' construct is orphaned if it appears
> inside a 'routine' region, right?

The "not lexically contained in a compute construct" definition is
from the standard. Assuming that the frontend's parser rejects "loop"
directives if they do not occur inside of either the "serial",
"parallel", "kernels" compute constructs or in a function with a
"routine" directive, both definitions should be indeed equivalent ;-).

> Unless Julian/Kwok speak up soon: OK, thanks.
>
> Reviewed-by: Thomas Schwinge 
>
> May want to remove "libgomp" from the first line of the commit log --
> this commit doesn't relate to libgomp specifically.

Right.

> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
> 'parallel', but we can add that later.  I anyway have a WIP patch
> waiting, adding more 'serial' construct testing, for a different reason,
> so I'll include it there.)

I had left this out intentionally, because having the gang reduction in
the serial construct leads to a "region contains gang partitioned
code but is not gang partitioned"
error. Of course, we might still add a test case with that expectation.

Thanks for the review!

Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message

2020-07-20 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

>> Can I include the patch in OG10?
>
> Unless Julian/Kwok speak up soon: OK, thanks.

This has been delayed a bit by my vacation, but I have now committed
the patch.

> May want to remove "libgomp" from the first line of the commit log --
> this commit doesn't relate to libgomp specifically.
>
> (Ideally, we'd also test 'serial' construct in addition to 'kernels',
> 'parallel', but we can add that later.  I anyway have a WIP patch
> waiting, adding more 'serial' construct testing, for a different reason,
> so I'll include it there.)

I forgot to remove "libgomp" from the commit message, sorry, but
I have included the test cases for the "serial construct".

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 7c10ae450b95495dda362cb66770bb78b546592e Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 20 Jul 2020 11:24:21 +0200
Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan
 loop" error message

According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf.  section "Changes
from Version 2.0 to 2.5").  A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.

This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.

2020-07-20  Frederik Harwath  

gcc/fortran/

	* openmp.c (oacc_is_parallel_or_serial): Removed function.
	(oacc_is_kernels): New function.
	(oacc_is_compute_construct): New function.
	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
	instead of "oacc_is_parallel_or_serial" for checking that a
	loop is not orphaned.

gcc/testsuite/

	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
	verifying that the "gang reduction on an orphan loop" error message
	is not emitted for non-orphaned loops.

	* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
---
 gcc/fortran/ChangeLog |   9 ++
 gcc/fortran/openmp.c  |  13 ++-
 gcc/testsuite/ChangeLog   |   7 ++
 .../c-c++-common/goacc/orphan-reductions-2.c  | 103 ++
 .../gfortran.dg/goacc/orphan-reductions-2.f90 |  87 +++
 5 files changed, 216 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index e86279cb647..5a1f81c286e 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,12 @@
+2020-07-20  Frederik Harwath  
+
+	* openmp.c (oacc_is_parallel_or_serial): Removed function.
+	(oacc_is_kernels): New function.
+	(oacc_is_compute_construct): New function.
+	(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
+	instead of "oacc_is_parallel_or_serial" for checking that a
+	loop is not orphaned.
+
 2020-07-08  Harald Anlauf  
 
 	Backported from master:
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ab68e9f2173..706933c869a 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5927,9 +5927,16 @@ oacc_is_serial (gfc_code *code)
 }
 
 static bool
-oacc_is_parallel_or_serial (gfc_code *code)
+oacc_is_kernels (gfc_code *code)
 {
-  return oacc_is_parallel (code) || oacc_is_serial (code);
+  return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP;
+}
+
+static bool
+oacc_is_compute_construct (gfc_code *code)
+{
+  return oacc_is_parallel (code) || oacc_is_serial (code)
+|| oacc_is_kernels (code);
 }
 
 static gfc_statement
@@ -6223,7 +6230,7 @@ resolve_oacc_loop_blocks (gfc_code *code)
   for (c = omp_current_ctx; c; c = c->previous)
 	if (!oacc_is_loop (c->code))
 	  break;
-  if (c == NULL || !oacc_is_parallel_or_serial (c->code))
+  if (c == NULL || !oacc_is_compute_construct (c->code))
 	gfc_error ("gang reduction on an orphan loop at %L", &code->loc);
 }
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 59e6c93b07a..fa1937a4ea2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2020-07-20  Frederik Harwath  
+
+	* gfortran.dg/goacc/orphan-reductions-2.f90: New test
+	verifying that the "gang reduction on an orphan loop&quo

Re: [og9] Really fix og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof"

2020-03-27 Thread Frederik Harwath


Hi Thomas,

Thomas Schwinge  writes:

> On 2020-03-25T18:09:25+0100, I wrote:
>> On 2018-02-22T12:23:25+0100, Tom de Vries  wrote:
>>> when using cuda 9 nvprof with an openacc executable, the executable hangs.
>
>> What Frederik has discovered today in the hard way... [...]
>> -- the hang was back. [...]
> ..., and now the attached patch to devel/omp/gcc-9 in commit
> 775f1686a3df68bd20370f1fabc6273883e2c5d2 'Really fix og9 "Fix hang when
> running oacc exec with CUDA 9.0 nvprof"'.

Thanks for fixing this issue! I can confirm that nvprof now works on
code compiled from devel/omp/gcc-9. I have used nvprof 9.1.85 on Ubuntu
18.04 for testing.

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: [og8] Report errors on missing OpenACC reduction clauses in nested reductions

2020-04-21 Thread Frederik Harwath
Thomas Schwinge  writes:

Hi Thomas,

> Via <https://gcc.gnu.org/PR94629> "10 issues located by the PVS-studio
> static analyzer" (so please reference that one on any patch submission),
> on <https://habr.com/en/company/pvs-studio/blog/497640/> in "Fragment N3,
> Assigning a variable to itself", we find this latter assignment qualified
> as "very strange to assign a variable to itself".
>
> Probably that should've been 'outer_ctx' instead of 'ctx'?

I agree that the original intention must have been to assign the
outer_ctx's "outer_reduction_clauses" to the corresponding field of the
inner "ctx". This would make sense, semantically. But this field is
meant to be used by the function "scan_omp_for" only and ...

> then does the current algorith still work despite this error?

... this function never requires the struct field to be intialized in
that way.  Before the field is used, it always copies the clauses from
the outer context's outer_reduction_clauses to ctx->outer_reduction_clauses:

>> +  if (ctx->outer_reduction_clauses == NULL && ctx->outer != NULL)
>> +ctx->outer_reduction_clauses
>> +  = chainon (unshare_expr (ctx->outer->local_reduction_clauses),
>> + ctx->outer->outer_reduction_clauses);

Hence I found it preferrable to remove the assignment to the
"outer_reduction_clauses" field and the "local_reduction_clauses" field
from "new_omp_context" completely. (The fields are still zero intialized
by the allocation of the struct which uses XCNEW.) That way the whole
logic regarding the fields is now contained in "scan_omp_for".

I have executed "make check" (on x86_64-linux-gnu) to verify that the
change causes no regressions. Ok to push the commit to master?

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2d60b374a44b212ff97c8b1fd6f8c39e478dc70f Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 21 Apr 2020 12:36:14 +0200
Subject: [PATCH] Remove fishy self-assignment in omp-low.c [PR94629]

The PR noticed that omp-low.c contains a self-assignment in the 
function new_omp_context:

if (outer_ctx) {
...
ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;

This is obviously useless.  The original intention might have been
to copy the field from the outer_ctx to ctx.  Since this is done
(properly) in the only function where this field is actually used
(in function scan_omp_for) and the field is being initialized to zero
during the struct allocation, there is no need to attempt to do
anything to this field in new_omp_context. Thus this commit
removes any assignment to the field from new_omp_context.

2020-04-21  Frederik Harwath  

	PR other/94629
	* gcc/omp-low.c (new_omp_context): Remove assignments to
	ctx->outer_reduction_clauses and ctx->local_reduction_clauses.
---
 gcc/omp-low.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 67565d61400..88f23e60d34 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -128,10 +128,16 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
+  /* A tree_list of the reduction clauses in this context. This is
+only used for checking the consistency of OpenACC reduction
+clauses in scan_omp_for and is not guaranteed to contain a valid
+value outside of this function. */
   tree local_reduction_clauses;
 
-  /* A tree_list of the reduction clauses in outer contexts.  */
+  /* A tree_list of the reduction clauses in outer contexts. This is
+only used for checking the consistency of OpenACC reduction
+clauses in scan_omp_for and is not guaranteed to contain a valid
+value outside of this function. */
   tree outer_reduction_clauses;
 
   /* Nesting depth of this context.  Used to beautify error messages re
@@ -931,8 +937,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->outer = outer_ctx;
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
-  ctx->local_reduction_clauses = NULL;
-  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
   ctx->depth = outer_ctx->depth + 1;
 }
   else
@@ -948,8 +952,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.transform_call_graph_edges = CB_CGE_MOVE;
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
-  ctx->local_reduction_clauses = NULL;
-  ctx->outer_reduction_clauses = NULL;
   ctx->depth = 1;
 }
 
-- 
2.17.1



[PATCH] libgomp_g.h: Include stdint.h instead of gstdint.h

2019-09-30 Thread Frederik Harwath
Hi,
I am a new member of Mentor's Sourcery Tools Services group and this is the 
first patch that I am submitting here.
I do not have write access to the svn repository yet, hence someone would have 
to merge this patch for me if it gets accepted.
But I intend to apply for an account soon.

The patch changes libgomp/libgomp_g.h to include stdint.h instead of the 
internal gstdint.h. The inclusion of gstdint.h has been
introduced by GCC trunk r265930, presumably because this introduced uses of 
uintptr_t. Since gstdint.h is not part of GCC's
installation, several libgomp test cases fail to compile when running the tests 
with the installed GCC.

I have tested the patch with "make check" on x86_64 GNU/Linux.

Best regards,
Frederik

libgomp/ChangeLog:

2019-09-25  Kwok Cheung Yeung  

 * libgomp_g.h: Include stdint.h instead of gstdint.h.

diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index 32a9d8a..dfb55fb 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -31,7 +31,7 @@
 
 #include 
 #include 
-#include "gstdint.h"
+#include 
 
 /* barrier.c */
 


Re: [PATCH] libgomp_g.h: Include stdint.h instead of gstdint.h

2019-09-30 Thread Harwath, Frederik


Hi Jakub,

Am 30.09.2019 um 09:25 schrieb Jakub Jelinek:
> On Mon, Sep 30, 2019 at 12:03:00AM -0700, Frederik Harwath wrote:
>> The patch changes libgomp/libgomp_g.h to include stdint.h instead of the 
>> internal gstdint.h. [...]
> 
> That looks wrong, will make libgomp less portable. [...]
>   Jakub

We have discussed this issue with Joseph Myers. Let me quote what Joseph
wrote:

"I think including  is appropriate (and, more generally,
removing the special configure support for GCC_HEADER_STDINT for
anything built only for the target - note that libgcc/gstdint.h has a
comment saying it's about libdecnumber portability to *hosts*, not
targets, without stdint.h). On any target without stdint.h, GCC should
be providing its own; the only targets where GCC does not yet know about
target stdint.h types are SymbianOS, LynxOS, QNX, TPF (see GCC bug 448),
and I think it's pretty unlikely libgomp would do anything useful for
those (and if in fact they do provide stdint.h, there wouldn't be an
issue anyway)."

Hence, I think the change will not affect portability negatively.

Best regards,
Frederik



Add myself to MAINTAINERS files

2019-10-01 Thread Harwath, Frederik
2019-10-01  Frederik Harwath 

* MAINTAINERS: Add myself to Write After Approval

Index: ChangeLog
===
--- ChangeLog   (revision 276390)
+++ ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2019-10-01  Frederik Harwath 
+
+   * MAINTAINERS: Add myself to Write After Approval
+
 2019-09-26  Richard Sandiford  

* MAINTAINERS: Add myself as an aarch64 maintainer.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 276390)
+++ MAINTAINERS (working copy)
@@ -409,6 +409,7 @@
 Wei Guozhi 
 Mostafa Hagog  
 Andrew Haley   
+Frederik Harwath   
 Stuart Hastings
 Michael Haubenwallner  

 Pat Haugen 






[PATCH] Report errors on inconsistent OpenACC nested reduction, clauses

2019-10-21 Thread Harwath, Frederik

Hi,
OpenACC requires that, if a variable is used in reduction clauses on two nested 
loops, then there
must be reduction clauses for that variable on all loops that are nested in 
between the two loops
and all these reduction clauses must use the same operator; this has been first 
clarified by
OpenACC 2.6. This commit introduces a check for that property which reports 
errors if the property
is violated.

I have tested the patch by comparing "make check" results and I am not aware of 
any regressions.

Gergö has implemented the check and it works, but I was wondering if the way in 
which the patch
avoids issuing errors about operator switches more than once by modifying the 
clauses (cf. the
corresponding comment in omp-low.c) could lead to problems - the processing 
might still continue
after the error on the modified tree, right? I was also wondering about the 
best place for such
checks. Should this be a part of "pass_lower_omp" (as in the patch) or should 
it run earlier
like, for instance, "pass_diagnose_omp_blocks".

Can the patch be included in trunk?

Frederik



>From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Mon, 21 Oct 2019 08:27:58 +0200
Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction
 clauses
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause";
this was first clarified by OpenACC 2.6) requires that, if a
variable is used in reduction clauses on two nested loops, then
there must be reduction clauses for that variable on all loops
that are nested in between the two loops and all these reduction
clauses must use the same operator.
This commit introduces a check for that property which reports
errors if it is violated.

In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous
reductions on variable b; adding a reduction clause to make it compile cleanly
would make it a duplicate of the test for variable c.

2010-10-21  Gergö Barany  
		Frederik Harwath  

	 gcc/
	 * omp-low.c (struct omp_context): New fields
	 local_reduction_clauses, outer_reduction_clauses.
	 (new_omp_context): Initialize these.
	 (scan_sharing_clauses): Record reduction clauses on OpenACC
	 constructs.
	 (scan_omp_for): Check reduction clauses for incorrect nesting.
	 gcc/testsuite/
	 * c-c++-common/goacc/nested-reductions-fail.c: New test.
	 * c-c++-common/goacc/nested-reductions.c: New test.
	 * c-c++-common/goacc/reduction-6.c: Adjust.
	 libgomp/
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c:
	 Add missing reduction clauses.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c:
	 Likewise.
	 * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c:
	 Likewise.
---
 gcc/omp-low.c | 107 +++-
 .../goacc/nested-reductions-fail.c| 492 ++
 .../c-c++-common/goacc/nested-reductions.c| 420 +++
 .../c-c++-common/goacc/reduction-6.c  |  11 -
 .../par-loop-comb-reduction-1.c   |   2 +-
 .../par-loop-comb-reduction-2.c   |   2 +-
 .../par-loop-comb-reduction-3.c   |   2 +-
 .../par-loop-comb-reduction-4.c   |   2 +-
 8 files changed, 1022 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 279b6ef893a..a2212274685 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -127,6 +127,12 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
+  /* A tree_list of the reduction clauses in this context.  */
+  tree local_reduction_clauses;
+
+  /* A tree_list of the reduction clauses in outer contexts.  */
+  tree outer_reduction_clauses;
+
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -902,6 +908,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb = outer_ctx->cb;
   ctx->cb.block = NULL;
   ctx->depth = outer_ctx->depth + 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = ctx->outer_reduction_clauses;
 }
   else
 {
@@ -917,6 +925,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx)
   ctx->cb.adjust_array_error_bounds = true;
   ctx->cb.dont_remap_vla_if_no_change = true;
   ctx->depth = 1;
+  ctx->local_reduction_clauses = NULL;
+  ctx->outer_reduction_clauses = NULL;
 

testsuite: clarify scan-dump file globbing behavior

2020-05-15 Thread Frederik Harwath
Hi,

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned. This behavior is currently
undocumented. Furthermore, the current implementation of
"scan-dump" and related procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file at all results in an unresolved test.

This patch documents the globbing behavior. The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

The procedures in scandump.exp all perform the file name expansion in
essentially the same way and I have extracted this into a new
procedure. But there is one very minor exception:

> @@ -67,10 +95,10 @@ proc scan-dump { args } {
>  set dumpbase [dump-base $src [lindex $args 3]]
> -set output_file "[glob -nocomplain $dumpbase.[lindex $args 2]]"
> +
> +set pattern "$dumpbase.[lindex $args 2]"
> +set output_file "[glob-dump-file $testcase $pattern]"
>  if { $output_file == "" } {
> - verbose -log "$testcase: dump file does not exist"
> - verbose -log "dump file: $dumpbase.$suf"

"scan-dump" is the only procedure that prints the "dump file: ..." line.
Should this be kept or is it ok to remove this as I have done in the
patch? $dumpbase.$suf does not emit the correct file name anyway
(a random example from my testing: "dump file: stdatomic-init.c.dce*")
and the name of the files can be inferred from the test name easily.

I have tested the changes by running "make check" (with a
--enable-languages=C only build, but this covers lots of uses
of the affected test procedures) and observed no regressions.

Ok to commit this to master?

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 6912e03d51d360dbbcf7eb1dc8d77d08c2a6e54c Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 15 May 2020 10:35:48 +0200
Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned.  This behavior is currently
undocumented.  Furthermore, the current implementation of
"scan-dump" and similar procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file results in an unresolved test.

This commit documents the globbing behavior.  The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

gcc/ChangeLog:

2020-05-15  Frederik Harwath  

	* doc/sourcebuild.texi: Describe globbing of the
	dump file scanning commands "suffix" argument.

gcc/testsuite/ChangeLog:

2020-05-15  Frederik Harwath  

	* lib/scandump.exp (glob-dump-file): New proc.
	(scan-dump): Use glob-dump-file for file name expansion.
	(scan-dump-times): Likewise.
	(scan-dump-dem): Likewise.
	(scan-dump-dem-not): Likewise.
---
 gcc/doc/sourcebuild.texi   |  4 ++-
 gcc/testsuite/lib/scandump.exp | 54 +++---
 2 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 240d6e4b08e..b6c5a21cb71 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2888,7 +2888,9 @@ stands for zero or more unmatched lines; the whitespace after
 
 These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree},
 @code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa}, and
-@code{wpa-ipa}.
+@code{wpa-ipa}.  The @var{suffix} argument which describes the dump file
+to be scanned may contain a glob pattern that must expand to exactly one
+file name.
 
 @table @code
 @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail @var{selector} @}]
diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index d6ba350acc8..f3a991b590a 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -39,6 +39,34 @@ proc dump-base { args } {
 return $dumpbase
 }
 
+# Expand dump file name pattern to exactly one file.
+# Return a single dump file name or an empty string
+# if the pattern matches no file or more than one file.
+#
+# Argument 0 is the testcase name
+# Argument 1 is the dump file glob pattern
+proc glob-dump-file { args } {
+
+set pattern [lindex $args 1]
+set dump_file "[glob -nocomplain $pattern]"
+set num_files [llength $dump_file]
+
+if { $num_files != 1 } {
+	set testcase

Re: testsuite: clarify scan-dump file globbing behavior

2020-05-19 Thread Frederik Harwath
Hi Thomas,

Thomas Schwinge  writes:

> I can't formally approve testsuite patches, but did a review anyway:

Thanks for the review!

> On 2020-05-15T12:31:54+0200, Frederik Harwath  
> wrote:

>> The dump
>> scanning procedures are changed to make the test unresolved
>> if globbing matches more than one file.
>
> (The code changes look good, but I have not tested that specific aspect.)

We do not have automated tests for the testsuite commands :-), but I
have of course tested this manually.

> As I said, not an approval, and minor comments (see below), but still:
>
> Reviewed-by: Thomas Schwinge 
>
> Do we have to similarly also audit/alter other testsuite infrastructure
> files, anything that uses '[glob [...]]'?  (..., and then generalize
> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
> incrementally, as far as I'm concerned.

I also think it would make sense to adapt similar test commands as well.

> May also make this more useful/explicit:
>
> This is useful if, for example, if a pass has several static
> instances [correct terminology?], and depending on torture testing
> command-line flags, a different instance executes and produces a dump
> file, and so in the test case you can use a generic [put example
> here] to scan the varying dump files names.
>
> (Or similar.)

I have moved the explanation below the description of the individual
commands and added an example. See the attached revised patch.

Best regards,
Frederik

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Fri, 15 May 2020 10:35:48 +0200
Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior

The test commands for scanning optimization dump files
perform globbing on the argument that specifies the suffix
of the dump files to be scanned.  This behavior is currently
undocumented.  Furthermore, the current implementation of
"scan-dump" and similar procedures yields an error whenever
the file name globbing matches more than one file (due to an
attempt to call "open" on multiple files) while a failure to
match any file results in an unresolved test.

This commit documents the globbing behavior.  The dump
scanning procedures are changed to make the test unresolved
if globbing matches more than one file.

gcc/ChangeLog:

2020-05-19  Frederik Harwath  

	* doc/sourcebuild.texi: Describe globbing of the
	dump file scanning commands "suffix" argument.

gcc/testsuite/ChangeLog:

2020-05-19  Frederik Harwath  

	* lib/scandump.exp (glob-dump-file): New proc.
	(scan-dump): Use glob-dump-file for file name expansion.
	(scan-dump-times): Likewise.
	(scan-dump-dem): Likewise.
	(scan-dump-dem-not): Likewise.

Reviewed-by: Thomas Schwinge 
---
 gcc/doc/sourcebuild.texi   | 13 
 gcc/testsuite/lib/scandump.exp | 54 +++---
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 240d6e4b08e..9df4b06d460 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in the dump file with
 suffix @var{suffix}.
 @end table
 
+The @var{suffix} argument which describes the dump file to be scanned
+may contain a glob pattern that must expand to exactly one file
+name. This is useful if, e.g., different pass instances are executed
+depending on torture testing command-line flags, producing dump files
+whose names differ only in their pass instance number suffix.  For
+example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
+occurrences of the string ``code has been optimized'', use:
+@smallexample
+/* @{ dg-options "-fdump-tree-mypass" @} */
+/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" @} @} */
+@end smallexample
+
+
 @subsubsection Check for output files
 
 @table @code
diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index d6ba350acc8..f3a991b590a 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -39,6 +39,34 @@ proc dump-base { args } {
 return $dumpbase
 }
 
+# Expand dump file name pattern to exactly one file.
+# Return a single dump file name or an empty string
+# if the pattern matches no file or more than one file.
+#
+# Argument 0 is the testcase name
+# Argument 1 is the dump file glob pattern
+proc glob-dump-file { args } {
+
+set pattern [lindex $args 1]
+set dump_file "[glob -nocomplain $pattern]"
+set num_files [llength $dump_file]
+
+

[PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

2020-05-19 Thread Frederik Harwath
Hi,
the new contrib/gcc-changelog/git_check_commit.py script
(which, by the way, is very useful!) does not handle "Reviewed-by" and
"Reviewed-on" lines yet and hence it expects those lines to be indented
by a tab although those lines are usually not indented. The script
already knows about "Co-Authored-By" lines and I have extended it to
handle the "Reviewed-{by,on}" lines in a similar way. The information
from those lines is not processed further since the review information
apparantly does not get included in the ChangeLogs.

Ok to commit the patch?

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 0dc9b201bc1607de36cb9b3604a87cc3646292e3 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 19 May 2020 11:15:28 +0200
Subject: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

git-check-commit.py does not know about "Reviewed-by" and
"Reviewed-on" lines and hence it expects those lines which
follow the ChangeLog entries to be indented by a tab.

This commit makes the script skip those lines.  No further
processing is attempted because the review information
is not part of the ChangeLogs.

contrib/

2020-05-19  Frederik Harwath  

	* gcc-changelog/git_commit.py: Skip over lines starting
	with "Reviewed-by: " or "Reviewed-on: ".
---
 contrib/gcc-changelog/git_commit.py | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index 5214cc36538..ebcf853f02f 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -150,6 +150,8 @@ star_prefix_regex = re.compile(r'\t\*(?P\ *)(?P.*)')
 LINE_LIMIT = 100
 TAB_WIDTH = 8
 CO_AUTHORED_BY_PREFIX = 'co-authored-by: '
+REVIEWED_BY_PREFIX = 'reviewed-by: '
+REVIEWED_ON_PREFIX = 'reviewed-on: '
 
 
 class Error:
@@ -344,12 +346,19 @@ class GitCommit:
 else:
 pr_line = line.lstrip()
 
-if line.lower().startswith(CO_AUTHORED_BY_PREFIX):
+lowered_line = line.lower()
+if lowered_line.startswith(CO_AUTHORED_BY_PREFIX):
 name = line[len(CO_AUTHORED_BY_PREFIX):]
 author = self.format_git_author(name)
 self.co_authors.append(author)
 continue
 
+# Skip over review information for now.
+# This avoids errors due to missing tabs on these lines below.
+if lowered_line.startswith((REVIEWED_BY_PREFIX,\
+REVIEWED_ON_PREFIX)):
+continue
+
 # ChangeLog name will be deduced later
 if not last_entry:
 if author_tuple:
-- 
2.17.1



Re: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}

2020-05-19 Thread Frederik Harwath
Martin Liška  writes:

Hi Martin,

> On 5/19/20 11:45 AM, Frederik Harwath wrote:
> Thank you Frederick for the patch.
>
> Looking at what I grepped:
> https://github.com/marxin/gcc-changelog/issues/1#issuecomment-621910248

I get a 404 error when I try to access this URL. The repository also
does not seem to be in your list of public repositories.


> Can you also add 'Signed-off-by'? And please create a list with these
> exceptions at the beginning of the script.

Yes, I will add it.

> Fine with that.

Best regards,
Frederik
-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter


Re: testsuite: clarify scan-dump file globbing behavior

2020-05-25 Thread Frederik Harwath
Frederik Harwath  writes:

Hi Rainer, hi Mike,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html

Best regards,
Frederik

> Hi Thomas,
>
> Thomas Schwinge  writes:
>
>> I can't formally approve testsuite patches, but did a review anyway:
>
> Thanks for the review!
>
>> On 2020-05-15T12:31:54+0200, Frederik Harwath  
>> wrote:
>
>>> The dump
>>> scanning procedures are changed to make the test unresolved
>>> if globbing matches more than one file.
>>
>> (The code changes look good, but I have not tested that specific aspect.)
>
> We do not have automated tests for the testsuite commands :-), but I
> have of course tested this manually.
>
>> As I said, not an approval, and minor comments (see below), but still:
>>
>> Reviewed-by: Thomas Schwinge 
>>
>> Do we have to similarly also audit/alter other testsuite infrastructure
>> files, anything that uses '[glob [...]]'?  (..., and then generalize
>> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
>> incrementally, as far as I'm concerned.
>
> I also think it would make sense to adapt similar test commands as well.
>
>> May also make this more useful/explicit:
>>
>> This is useful if, for example, if a pass has several static
>> instances [correct terminology?], and depending on torture testing
>> command-line flags, a different instance executes and produces a dump
>> file, and so in the test case you can use a generic [put example
>>     here] to scan the varying dump files names.
>>
>> (Or similar.)
>
> I have moved the explanation below the description of the individual
> commands and added an example. See the attached revised patch.
>
> Best regards,
> Frederik
>
> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
> From: Frederik Harwath 
> Date: Fri, 15 May 2020 10:35:48 +0200
> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior
>
> The test commands for scanning optimization dump files
> perform globbing on the argument that specifies the suffix
> of the dump files to be scanned.  This behavior is currently
> undocumented.  Furthermore, the current implementation of
> "scan-dump" and similar procedures yields an error whenever
> the file name globbing matches more than one file (due to an
> attempt to call "open" on multiple files) while a failure to
> match any file results in an unresolved test.
>
> This commit documents the globbing behavior.  The dump
> scanning procedures are changed to make the test unresolved
> if globbing matches more than one file.
>
> gcc/ChangeLog:
>
> 2020-05-19  Frederik Harwath  
>
>   * doc/sourcebuild.texi: Describe globbing of the
>   dump file scanning commands "suffix" argument.
>
> gcc/testsuite/ChangeLog:
>
> 2020-05-19  Frederik Harwath  
>
>   * lib/scandump.exp (glob-dump-file): New proc.
>   (scan-dump): Use glob-dump-file for file name expansion.
>   (scan-dump-times): Likewise.
>   (scan-dump-dem): Likewise.
>   (scan-dump-dem-not): Likewise.
>
> Reviewed-by: Thomas Schwinge 
> ---
>  gcc/doc/sourcebuild.texi   | 13 
>  gcc/testsuite/lib/scandump.exp | 54 +++---
>  2 files changed, 56 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 240d6e4b08e..9df4b06d460 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in 
> the dump file with
>  suffix @var{suffix}.
>  @end table
>
> +The @var{suffix} argument which describes the dump file to be scanned
> +may contain a glob pattern that must expand to exactly one file
> +name. This is useful if, e.g., different pass instances are executed
> +depending on torture testing command-line flags, producing dump files
> +whose names differ only in their pass instance number suffix.  For
> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
> +occurrences of the string ``code has been optimized'', use:
> +@smallexample
> +/* @{ dg-options "-fdump-tree-mypass" @} */
> +/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" 
> @} @} */
> +@end smallexample
> +
> +
>  @subsubsection Check for output files
>
>  @table @code
> diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
> index d6ba350acc8..f3a991b590a 100644
> --- a/gcc/testsuite/lib/scandump.exp

PING Re: testsuite: clarify scan-dump file globbing behavior

2020-06-02 Thread Frederik Harwath
Frederik Harwath  writes:

ping :-)

> Frederik Harwath  writes:
>
> Hi Rainer, hi Mike,
> ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html
>
> Best regards,
> Frederik
>
>> Hi Thomas,
>>
>> Thomas Schwinge  writes:
>>
>>> I can't formally approve testsuite patches, but did a review anyway:
>>
>> Thanks for the review!
>>
>>> On 2020-05-15T12:31:54+0200, Frederik Harwath  
>>> wrote:
>>
>>>> The dump
>>>> scanning procedures are changed to make the test unresolved
>>>> if globbing matches more than one file.
>>>
>>> (The code changes look good, but I have not tested that specific aspect.)
>>
>> We do not have automated tests for the testsuite commands :-), but I
>> have of course tested this manually.
>>
>>> As I said, not an approval, and minor comments (see below), but still:
>>>
>>> Reviewed-by: Thomas Schwinge 
>>>
>>> Do we have to similarly also audit/alter other testsuite infrastructure
>>> files, anything that uses '[glob [...]]'?  (..., and then generalize
>>> 'glob-dump-file' into 'glob-one-file', or similar.)  That can be done
>>> incrementally, as far as I'm concerned.
>>
>> I also think it would make sense to adapt similar test commands as well.
>>
>>> May also make this more useful/explicit:
>>>
>>> This is useful if, for example, if a pass has several static
>>> instances [correct terminology?], and depending on torture testing
>>> command-line flags, a different instance executes and produces a dump
>>> file, and so in the test case you can use a generic [put example
>>> here] to scan the varying dump files names.
>>>
>>> (Or similar.)
>>
>> I have moved the explanation below the description of the individual
>> commands and added an example. See the attached revised patch.
>>
>> Best regards,
>> Frederik
>>
>> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001
>> From: Frederik Harwath 
>> Date: Fri, 15 May 2020 10:35:48 +0200
>> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior
>>
>> The test commands for scanning optimization dump files
>> perform globbing on the argument that specifies the suffix
>> of the dump files to be scanned.  This behavior is currently
>> undocumented.  Furthermore, the current implementation of
>> "scan-dump" and similar procedures yields an error whenever
>> the file name globbing matches more than one file (due to an
>> attempt to call "open" on multiple files) while a failure to
>> match any file results in an unresolved test.
>>
>> This commit documents the globbing behavior.  The dump
>> scanning procedures are changed to make the test unresolved
>> if globbing matches more than one file.
>>
>> gcc/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * doc/sourcebuild.texi: Describe globbing of the
>>  dump file scanning commands "suffix" argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-05-19  Frederik Harwath  
>>
>>  * lib/scandump.exp (glob-dump-file): New proc.
>>  (scan-dump): Use glob-dump-file for file name expansion.
>>  (scan-dump-times): Likewise.
>>  (scan-dump-dem): Likewise.
>>  (scan-dump-dem-not): Likewise.
>>
>> Reviewed-by: Thomas Schwinge 
>> ---
>>  gcc/doc/sourcebuild.texi   | 13 
>>  gcc/testsuite/lib/scandump.exp | 54 +++---
>>  2 files changed, 56 insertions(+), 11 deletions(-)
>>
>> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
>> index 240d6e4b08e..9df4b06d460 100644
>> --- a/gcc/doc/sourcebuild.texi
>> +++ b/gcc/doc/sourcebuild.texi
>> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text 
>> in the dump file with
>>  suffix @var{suffix}.
>>  @end table
>>
>> +The @var{suffix} argument which describes the dump file to be scanned
>> +may contain a glob pattern that must expand to exactly one file
>> +name. This is useful if, e.g., different pass instances are executed
>> +depending on torture testing command-line flags, producing dump files
>> +whose names differ only in their pass instance number suffix.  For
>> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for
>> +occurrences of the string ``code has been optimized'&#x

[PATCH] Add OpenACC 2.6 `acc_get_property' support

2019-11-14 Thread Frederik Harwath
Hi,
this patch implements OpenACC 2.6 "acc_get_property" and related functions.
I have tested the patch on x86_64-linux-gnu with nvptx-none offloading.
There is no AMD GCN support yet. This will be added later on.

Can this be committed to trunk?

Best regards,
Frederik

--- 8< ---

Add generic support for the OpenACC 2.6 `acc_get_property' and
`acc_get_property_string' routines, as well as full handlers for the
host and the NVPTX offload targets and minimal handlers for the HSA
and Intel MIC offload targets.

Included are C/C++ and Fortran tests that, in particular, print
the property values for acc_property_vendor, acc_property_memory,
acc_property_free_memory, acc_property_name, and acc_property_driver.
The output looks as follows:

Vendor: GNU
Name: GOMP
Total memory: 0
Free memory: 0
Driver: 1.0

with the host driver (where the memory related properties are not
supported for the host device and yield 0, conforming to the standard)
and output like:

OpenACC vendor: Nvidia
OpenACC total memory: 12651462656
OpenACC free memory: 12202737664
OpenACC name: TITAN V
OpenACC driver: CUDA Driver 9.1

with the NVPTX driver.

2019-11-14  Maciej W. Rozycki  
Frederik Harwath  
Thomas Schwinge  

include/
* gomp-constants.h (GOMP_DEVICE_CURRENT,
GOMP_DEVICE_PROPERTY_MEMORY, GOMP_DEVICE_PROPERTY_FREE_MEMORY,
GOMP_DEVICE_PROPERTY_NAME, GOMP_DEVICE_PROPERTY_VENDOR,
GOMP_DEVICE_PROPERTY_DRIVER, GOMP_DEVICE_PROPERTY_STRING_MASK):
New Macros.

libgomp/
* libgomp.h (gomp_device_descr): Add `get_property_func' member.
* libgomp-plugin.h (gomp_device_property_value): New union.
(gomp_device_property_value): New prototype.
* openacc.h (acc_device_t): Add `acc_device_current' enumeration
constant.
(acc_device_property_t): New enum.
(acc_get_property, acc_get_property_string): New prototypes.
* oacc-init.c (acc_get_device_type): Also assert on
`!acc_device_current' result.
(get_property_any, acc_get_property, acc_get_property_string):
New functions.
* openacc.f90 (openacc_kinds): From `iso_fortran_env' also
import `int64'.  Add `acc_device_current' and
`acc_property_memory', `acc_property_free_memory',
`acc_property_name', `acc_property_vendor' and
`acc_property_driver' constants.  Add `acc_device_property' data
type.
(openacc_internal): Add `acc_get_property' and
`acc_get_property_string' interfaces.  Add `acc_get_property_h',
`acc_get_property_string_h', `acc_get_property_l' and
`acc_get_property_string_l'.
(openacc_c_string): New module.
* oacc-host.c (host_get_property): New function.
(host_dispatch): Wire it.
* target.c (gomp_load_plugin_for_device): Handle `get_property'.
* libgomp.map (OACC_2.6): Add `acc_get_property',
`acc_get_property_h_', `acc_get_property_string' and
`acc_get_property_string_h_' symbols.
* oacc-init.c (acc_known_device_type): Add function.
(unknown_device_type_error): Add function.
(name_of_acc_device_t): Change to call unknown_device_type_error
on unknown type.
(resolve_device): Use acc_known_device_type.
(acc_init): Fail if acc_device_t argument is not valid.
(acc_shutdown): Likewise.
(acc_get_num_devices): Likewise.
(acc_set_device_type): Likewise.
(acc_get_device_num): Likewise.
(acc_set_device_num): Likewise.
(get_property_any): Likewise.
(acc_get_property): Likewise.
(acc_get_property_string): Likewise.
(acc_on_device): Likewise.
(goacc_save_and_set_bind): Likewise.

* libgomp.texi (OpenACC Runtime Library Routines): Add
`acc_get_property'.
(acc_get_property): New node.

* plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function.
* plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName',
`cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo'
calls.
(GOMP_OFFLOAD_get_property): New function.
(struct ptx_device): Add new field "name" ...
(nvptx_open_device): ... and alloc and init from here.
(nvptx_close_device): ... and free from here.
(cuda_driver_version): Add new static variable ...
(nvptx_init): ... and init from here.

* testsuite/libgomp.oacc-c-c++-common/acc-get-property.c: New
test.
* testsuite/libgomp.oacc-c-c++-common/acc-get-property-2.c: New
test.
 * testsuite/libgomp.oacc-c-c++-common/acc-get-property-3.c: New
test.
 * testsuite/libgomp.oacc-c-c++

[PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi,
currently, on trunk, the tests gcc.dg/vect/vect-cond-reduc-1.c and 
gcc.dg/pr68286.c fail when compiling for amdgcn-unknown-amdhsa.
The reason seems to lie in the interaction of the changes that have been 
introduced by revision r276659
("Allow COND_EXPR and VEC_COND_EXPR condtions to trap" by Ilya Leoshkevich) of 
trunk and the vectorized code that is generated for amdgcn.

If the function maybe_resimplify_conditional_op from gimple-match-head.c gets 
called on a conditional operation without an "else" part, it
makes the operation unconditional, but only if the operation cannot trap. To 
check this, it uses operation_could_trap_p.
This ends up in a violated assertion in the latter function if 
maybe_resimplify_conditional_op is called on a COND_EXPR or VEC_COND_EXPR:

 /* This function cannot tell whether or not COND_EXPR and VEC_COND_EXPR could
 trap, because that depends on the respective condition op.  */
  gcc_assert (op != COND_EXPR && op != VEC_COND_EXPR);

A related issue has been resolved by the patch that was committed as r276915 
("PR middle-end/92063" by Jakub Jelinek).

In our case, the error is triggered by the simplification rule at line 3450 of 
gcc/match.pd:

/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons
   return all -1 or all 0 results.  */
/* ??? We could instead convert all instances of the vec_cond to negate,
   but that isn't necessarily a win on its own.  */
(simplify
 (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
 (if (VECTOR_TYPE_P (type)
  && known_eq (TYPE_VECTOR_SUBPARTS (type),
   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
  && (TYPE_MODE (TREE_TYPE (type))
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
  (minus @3 (view_convert (vec_cond @0:0 (negate @1) @2)
)

It seems that this rule is not invoked when compiling for x86_64 where the 
generated code for vect-cond-reduc-1.c does not contain anything that would
match this rule. Could it be that there is no test covering this rule for 
commonly tested architectures?

I have changed maybe_resimplify_conditional_op to check if a COND_EXPR or 
VEC_COND_EXPR could trap by checking whether the condition can trap using
generic_expr_could_trap_p. Judging from the comment above the assertion and the 
code changes of r276659, it seems that this is both necessary and
sufficient to verify if those expressions can trap.

Does that sound reasonable and can the patch be included in trunk?

The patch fixes the failing tests for me and does not cause any visible 
regressions in the results of "make check" which I have executed for targets 
amdgcn-unknown-amdhsa
and x86_64-pc-linux-gnu.

Best regards,
Frederik



2019-11-28  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..4da6c4d7458 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,17 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+traps and hence we have to check this. For all other operations, we
+don't need to consider the operands. */
+  bool op_could_trap = op_code == COND_EXPR || op_code == VEC_COND_EXPR ?
+   generic_expr_could_trap_p (res_op->ops[0]) :
+   operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv, res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi Richard,

On 29.11.19 13:37, Richard Biener wrote:
> On Fri, Nov 29, 2019 at 1:24 PM Harwath, Frederik
>  wrote:
> [...]
>> It seems that this rule is not invoked when compiling for x86_64 where the 
>> generated code for vect-cond-reduc-1.c does not contain anything that would
>> match this rule. Could it be that there is no test covering this rule for 
>> commonly tested architectures?
> 
> This was all added for aarch64 SVE.  So it looks like the outer plus
> was conditional and we end up inheriting the
I should have mentioned this, it was indeed a COND_ADD.

> condition for the inner vec_cond.  Your fix looks reasonable but is
> very badly formatted.  Can you instead do
> 
>  if (op_Code == cOND_EPXR || op_code == vEC_COND_EXPR)
>op_could_trap = generic_expr_could_trap (..)
>  else
>   op_could_trap = operation_could_trap_p (...
> 

Sorry, sure!

Thanks,
Frederik



[PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi,

On 29.11.19 13:51, Harwath, Frederik wrote:

>> condition for the inner vec_cond.  Your fix looks reasonable but is
>> very badly formatted.  Can you instead do

I hope the formatting looks better now. I have also removed the [amdgcn] from 
the subject line since
the fact that this has been discovered in the context of amdgcn is not really 
essential.

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..c763a80a6d1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this. For all other operations, we
+ don't need to consider the operands. */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1




Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi Jakub,

On 29.11.19 14:41, Jakub Jelinek wrote:

> s/use/Use/
>
> [...]
>
> s/. /.  /

Right, thanks. Does that look ok for inclusion in trunk now?

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): Use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..9010f11621e 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this.  For all other operations, we
+ don't need to consider the operands.  */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
On 29.11.19 15:46, Richard Sandiford wrote:

> Thanks for doing this, looks good to me FWIW.  I was seeing the same
> failure for SVE but hadn't found time to look at it.

Thank you all for the review. Committed as r278853.

Frederik



[Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)

2019-12-03 Thread Harwath, Frederik
Hi Jakub,

On 08.11.19 07:41, Harwath, Frederik wrote:
> On 06.11.19 14:00, Jakub Jelinek wrote:
> [...]
>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be
>> more natural, wouldn't it.
> 
> Yes.
> 
> [...]
>> If gimplifier is not the right spot, then use a splay tree + vector instead?
>> splay tree for the outer ones, vector for the local ones, and put into both
>> the clauses, so you can compare reduction code etc.
> 
> Sounds like a good idea. I am going to try that.

Below you can find a patch that reimplements the nested reductions check using
more appropriate data structures. As an additional benefit, the quality of the 
warnings
has also improved (see description below). I have checked the patch by running 
the testsuite on
x86_64-pc-linux-gnu.

Best regards,
Frederik

From 94ca786172afa7dab7630d75965bf6d6f0dd24e1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 3 Dec 2019 10:38:01 +0100
Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking

Revision 277875 of trunk introduced a consistency check for nested OpenACC
reduction clauses. The implementation has two drawbacks:
1) It uses suboptimal data structures for storing information about
   the reduction clauses.
2) The warnings issued for *repeated* inconsistent use of reduction operators
   are confusing. For instance, on three nested loops that use the reduction
   operators +, -, + on the same variable, we obtain a warning at the switch
   from + to - (as desired) and another warning about the switch from - to +.
   It would be preferable to avoid the second warning since + is consistent
   with the first reduction operator.

This commit attempts to fix both problems by using more appropriate data
structures (splay trees and vectors instead of tree lists) for keeping track of
the information about the reduction clauses.

2019-12-3  Frederik Harwath  

	gcc/
	* omp-low.c (omp_context): Removed fields local_reduction_clauses,
	outer_reduction_clauses; added fields oacc_reduction_clauses,
	oacc_reductions_stack.
	(oacc_reduction_clause_location): New struct.
	(oacc_reduction_var_occ): New struct.
	(new_omp_context): Adjust omp_context initialization to new fields.
	(delete_omp_context): Adjust omp_context deletion to new fields.
	(rewind_oacc_reductions_stack): New function.
	(check_oacc_reduction_clause): New function.
	(check_oacc_reduction_clauses): New function.
	(scan_sharing_clauses): Call check_oacc_reduction_clause for
	reduction clauses (this handles clauses on compute regions)
	if a new optional flag is enabled.
	(scan_omp_for): Remove old nested reduction check, call
	 check_oacc_reduction_clauses instead.
	(scan_omp_target): Adapt call to scan_sharing_clauses to enable the new
	flag.

   	gcc/testsuite/
	* c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to
	 ignore warnings that are not relevant to the test.
	(acc_parallel): Stop expecting pruned warnings, adjust expected
	warnings to changes in omp-low.c, add checks for info messages about the
	location of clauses.
	(acc_parallel_loop): Likewise.
	(acc_parallel_reduction): Likewise.
	(acc_parallel_loop_reduction): Likewise.
	(acc_routine): Likewise.
	(acc_kernels): Likewise.

	* gfortran.dg/goacc/nested-reductions-warn.f90: Likewise.
---
 gcc/omp-low.c | 305 --
 .../goacc/nested-reductions-warn.c|  81 ++---
 .../goacc/nested-reductions-warn.f90  |  83 ++---
 3 files changed, 271 insertions(+), 198 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 19132f76da2..ba04e7477dc 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -73,6 +73,9 @@ along with GCC; see the file COPYING3.  If not see
scanned for regions which are then moved to a new
function, to be invoked by the thread library, or offloaded.  */
 
+
+struct oacc_reduction_var_occ;
+
 /* Context structure.  Used to store information about each parallel
directive in the code.  */
 
@@ -128,12 +131,6 @@ struct omp_context
  corresponding tracking loop iteration variables.  */
   hash_map *lastprivate_conditional_map;
 
-  /* A tree_list of the reduction clauses in this context.  */
-  tree local_reduction_clauses;
-
-  /* A tree_list of the reduction clauses in outer contexts.  */
-  tree outer_reduction_clauses;
-
   /* Nesting depth of this context.  Used to beautify error messages re
  invalid gotos.  The outermost ctx is depth 1, with depth 0 being
  reserved for the main body of the function.  */
@@ -163,8 +160,52 @@ struct omp_context
 
   /* True if there is bind clause on the construct (i.e. a loop construct).  */
   bool loop_p;
+
+  /* A mapping that maps a variable to information about the last OpenACC
+ reduction clause that used the variable above the current context.
+ This information is used for checking the nesting restrictions for
+ reduction clauses by the func

Re: [PATCH 2/4] Validate acc_device_t uses

2019-12-03 Thread Harwath, Frederik
Hi Thomas,

On 03.12.19 13:14, Thomas Schwinge wrote:
> You once had this patch separate, but then merged into the upstream
> submission of 'acc_get_property'; let's please keep this separate.
> 
> With changes as indicated below, please commit this to trunk [...]

Ok, I have committed the patch as revision 278936. You can find the committed 
version in the attachment. Thank you for the review!

> Generally, does usage of these functions obsolete some existing usage of
> 'acc_dev_num_out_of_range'?  (OK to address later.)

I think it does. I am going to verify this.

>> @@ -168,7 +184,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
>>break;
>>  
>>  default:
>> -  if (d > _ACC_device_hwm)
>> +  if (!acc_known_device_type (d))
>>  {
>>if (fail_is_error)
>>  goto unsupported_device;
> 
> Note that this had 'd > _ACC_device_hwm', not '>=' as it now does, that
> is, previously didn't reject 'd == _ACC_device_hwm' but now does -- but I
> suppose this was an (minor) bug that existed before, so OK to change as
> you did?
Right, I do not see any reasons why it should accept ACC_device_hwm and
the change did not cause any regressions.

Best regards,
Frederik



r278937 | frederik | 2019-12-03 15:38:54 +0100 (Di, 03 Dez 2019) | 25 lines

Validate acc_device_t uses

Check that function arguments of type acc_device_t
are valid enumeration values in all publicly visible
functions from oacc-init.c.

2019-12-03  Frederik Harwath  

libgomp/
* oacc-init.c (acc_known_device_type): Add function.
(unknown_device_type_error): Add function.
(name_of_acc_device_t): Change to call unknown_device_type_error
on unknown type.
(resolve_device): Use acc_known_device_type.
(acc_init): Fail if acc_device_t argument is not valid.
(acc_shutdown): Likewise.
(acc_get_num_devices): Likewise.
(acc_set_device_type): Likewise.
(acc_get_device_num): Likewise.
(acc_set_device_num): Likewise.
(acc_on_device): Add comment that argument validity is not checked.

Reviewed-by: Thomas Schwinge 



Index: libgomp/oacc-init.c
===
--- libgomp/oacc-init.c	(revision 278936)
+++ libgomp/oacc-init.c	(working copy)
@@ -82,6 +82,18 @@
   gomp_mutex_unlock (&acc_device_lock);
 }
 
+static bool
+known_device_type_p (acc_device_t d)
+{
+  return d >= 0 && d < _ACC_device_hwm;
+}
+
+static void
+unknown_device_type_error (acc_device_t invalid_type)
+{
+  gomp_fatal ("unknown device type %u", invalid_type);
+}
+
 /* OpenACC names some things a little differently.  */
 
 static const char *
@@ -103,8 +115,9 @@
 case acc_device_host: return "host";
 case acc_device_not_host: return "not_host";
 case acc_device_nvidia: return "nvidia";
-default: gomp_fatal ("unknown device type %u", (unsigned) type);
+default: unknown_device_type_error (type);
 }
+  __builtin_unreachable ();
 }
 
 /* ACC_DEVICE_LOCK must be held before calling this function.  If FAIL_IS_ERROR
@@ -123,7 +136,7 @@
 	if (goacc_device_type)
 	  {
 	/* Lookup the named device.  */
-	while (++d != _ACC_device_hwm)
+	while (known_device_type_p (++d))
 	  if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
   get_openacc_name (dispatchers[d]->name))
@@ -147,7 +160,7 @@
 
 case acc_device_not_host:
   /* Find the first available device after acc_device_not_host.  */
-  while (++d != _ACC_device_hwm)
+  while (known_device_type_p (++d))
 	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
 	  goto found;
   if (d_arg == acc_device_default)
@@ -168,7 +181,7 @@
   break;
 
 default:
-  if (d > _ACC_device_hwm)
+  if (!known_device_type_p (d))
 	{
 	  if (fail_is_error)
 	goto unsupported_device;
@@ -505,6 +518,9 @@
 void
 acc_init (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
@@ -519,6 +535,9 @@
 void
 acc_shutdown (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   gomp_init_targets_once ();
 
   gomp_mutex_lock (&acc_device_lock);
@@ -533,6 +552,9 @@
 int
 acc_get_num_devices (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_type_error (d);
+
   int n = 0;
   struct gomp_device_descr *acc_dev;
 
@@ -564,6 +586,9 @@
 void
 acc_set_device_type (acc_device_t d)
 {
+  if (!known_device_type_p (d))
+unknown_device_ty

[PATCH 00/40] OpenACC "kernels" Improvements

2021-12-15 Thread Frederik Harwath
Hi,
this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  Versions
of the patches have also been committed to the devel/omp/gcc-11 branch
recently.

The patch series contains middle-end changes that modify the "kernels"
loop handling to use Graphite for dependence analysis of loops in
"kernels" regions, as well as new optimizations and adjustments to
existing optimizations to support this analysis. A central step is
contained in the commit titled "openacc: Use Graphite for dependence
analysis in \"kernels\" regions" whose commit message also contains
further explanations. There are also front end changes (cf. the
patches by Sandra Loosemore) that prepare the loops in "kernels"
regions for the middle-end processing and which lift various
restrictions on "kernels" regions.  I have included some dependences
(the patches by Julian Brown) from the devel/omp/gcc-11 branch which
will be re-submitted independently for review.

I have bootstrapped the compiler on x86_64-linux-gnu and performed
comprehensive testing on a powerpc64le-linux-gnu target.  The patches
should apply cleanly on commit r12-4865 of the master branch.

I am aware that we cannot incorporate those patches into GCC at the
current development stage. I hope that we can discuss some of the
changes before they can be considered for inclusion in GCC during the
next stage 1.

Best regards,
Frederik


Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime a lias checking for OpenACC kernels

Frederik Harwath (20):
  Fortran: Delinearize array accesses
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Enable reduction variable localization for "kernels"
  openacc: Check type for references in reduction lowering
  openacc: Adjust testsuite to new "kernels" handling

Julian Brown (4):
  Reference reduction localization
  Fix tree check failure with reduction localization
  Use more appropriate var in localize_reductions call
  Handle references in OpenACC "private" clauses

Sandra Loosemore (12):
  Kernels loops annotation: C and C++.
  Add -fno-openacc-kernels-annotate-loops option to more testcases.
  Kernels loops annotation: Fortran.
  Additional Fortran testsuite fixes for kernels loops annotation pass.
  Fix bug in processing of array dimensions in data clauses.
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).
  Permit calls to builtins and intrinsics in kernels loops.
  Fix patterns in Fortran tests for kernels loop annotation.
  Clean up loop variable extraction in OpenACC kernels loop annotation.
  Relax some restrictions on the loop bound in kernels loop annotation.

Tobias Burnus (2):
  Fix for is_gimple_reg vars to 'data kernels'
  openacc: fix privatization of by-reference arrays

 gcc/Makefile.in   |   2 +
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 915 +++--
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/c/c-parser.c  |   3 +
 gcc/cfgloop.c |   1 +
 gcc/cfgloop.h |   6 +
 gcc/cfgloopmanip.c|   1 +
 gcc/common.opt|   9 +
 gcc/config/nvptx/nvptx.c  |   7 +
 gcc/cp/decl.c |  44 +
 gcc/cp/parser.c   |   3 +
 gcc/cp/semantics.c|   9 +
 gcc/doc/gimple.texi   |   2 +
 gcc/doc/invoke.texi   |  52 +-
 gcc/doc/passes.texi   

[PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Add
-fno-openacc-kernels-annotate-loops.
---
 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index cdf85d4bafae..0f2d2f0a757b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,5 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */

+/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 01/40] Kernels loops annotation: C and C++.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.

gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.c: Include tree-iterator.h
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): Split from c_finish_omp_for.  Use
annotation_error function.  Code refactoring to avoid destructive
changes that cannot be undone in case of error.
(is_local_var): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(check_and_annotate_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.

gcc/c/
* c-decl.c (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/cp/
* decl.c (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document -fno-openacc-kernels-annotate-loops.

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.
---
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 799 --
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/cp/decl.c |  44 +
 gcc/doc/invoke.texi   |  32 +-
 .../goacc/classify-kernels-unparallelized.c   |   1 +
 .../c-c++-common/goacc/classify-kernels.c |   3 +-
 .../kernels-counter-var-redundant-load.c  |   1 +
 .../kernels-counter-vars-function-scope.c |   1 +
 .../goacc/kernels-double-reduction-n.c|   1 +
 .../goacc/kernels-doub

[PATCH 03/40] Kernels loops annotation: Fortran.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

Co-Authored-By: Gergö Barany 

gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.c: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
---
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/lang.opt  |   8 +
 gcc/fortran/openmp.c  | 364 ++
 gcc/fortran/parse.c   |   9 +
 .../goacc/classify-kernels-unparallelized.f95 |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95|   1 +
 .../gfortran.dg/goacc/common-block-3.f90  |   1 +
 .../gfortran.dg/goacc/kernels-loop-2.f95  |   1 +
 .../goacc/kernels-loop-annotation-1.f95   |  33 ++
 .../goacc/kernels-loop-annotation-10.f95  |  32 ++
 .../goacc/kernels-loop-annotation-11.f95  |  34 ++
 .../goacc/kernels-loop-annotation-12.f95  |  39 ++
 .../goacc/kernels-loop-annotation-13.f95  |  38 ++
 .../goacc/kernels-loop-annotation-14.f95  |  35 ++
 .../goacc/kernels-loop-annotation-15.f95  |  35 ++
 .../goacc/kernels-loop-annotation-16.f95  |  34 ++
 .../goacc/kernels-loop-annotation-2.f95   |  32 ++
 .../goacc/kernels-loop-annotation-3.f95   |  33 ++
 .../goacc/kernels-loop-annotation-4.f95   |  34 ++
 .../goacc/kernels-loop-annotation-5.f95   |  35 ++
 .../goacc/kernels-loop-annotation-6.f95   |  34 ++
 .../goacc/kernels-loop-annotation-7.f95   |  48 +++
 .../goacc/kernels-loop-annotation-8.f95   |  50 +++
 .../goacc/kernels-loop-annotation-9.f95   |  34 ++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |   1 +
 .../goacc/kernels-loop-data-enter-exit.f95|   1 +
 .../goacc/kernels-loop-data-update.f95|   1 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |   1 +
 .../gfortran.dg/goacc/kernels-loop-n.f95  |   1 +
 .../gfortran.dg/goacc/kernels-loop.f95|   1 +
 .../kernels-parallel-loop-data-enter-exit.f95 |   1 +
 32 files changed, 974 insertions(+)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1

  1   2   >