[PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling

2021-12-16 Thread Frederik Harwath

Adjust the testsuite to changed expectations with the new
Graphite-based "kernels" handling.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c++/privatized-ref-2.C: Adjust.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: Adjust.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c:
Adjust.
* 
testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c:
Adjust.
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
* testsuite/libgomp.oacc-fortran/if-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/kernels-private-

[PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-03-24 Thread Frederik Harwath
Hi,
this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.  The "unroll" and "tile" directives are
internally implemented as clauses.  This fits the representation of
collapsed loop nests by a single internal gomp_for construct.  Loop
transformations can be applied to loops at the different levels of
such a loop nest and this can be represented well with the clause
representation.  The transformations can also be applied to loops
which are not going to be associated with any OpenMP directive after
the transformation. This is represented by a new gomp_for kind.  Loops
of this kind are lowered in the transformation pass since they are not
subject to any further OpenMP-specific processing.

The patches are roughly presented in the order of their development:
Each construct is implemented in the Fortran front end first including
the middle-end additions/changes, followed by a patch that adds the C
and C++ front end changes.  This initial implementation supports the
loop transformation constructs on the outermost loop of a loop nest
only.  The support for applying the transformations to inner loops is
then added in two further patches.

The patches have been bootstrapped and tested on x86_64-linux-gnu with
both nvptx-none and amdgcn-amdhsa offloading.

Best regards,
Frederik

Frederik Harwath (7):
  openmp: Add Fortran support for "omp unroll" directive
  openmp: Add C/C++ support for "omp unroll" directive
  openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
  openmp: Add Fortran support for "omp tile"
  openmp: Add C/C++ support for "omp tile"
  openmp: Add Fortran support for loop transformations on inner loops
  openmp: Add C/C++ support for loop transformations on inner loops

 gcc/Makefile.in   |1 +
 gcc/c-family/c-gimplify.cc|1 +
 gcc/c-family/c-omp.cc |   12 +-
 gcc/c-family/c-pragma.cc  |2 +
 gcc/c-family/c-pragma.h   |7 +-
 gcc/c/c-parser.cc |  403 +++-
 gcc/c/c-typeck.cc |   10 +-
 gcc/cp/cp-gimplify.cc |3 +
 gcc/cp/parser.cc  |  453 -
 gcc/cp/pt.cc  |   15 +-
 gcc/cp/semantics.cc   |  104 +-
 gcc/fortran/dump-parse-tree.cc|   30 +
 gcc/fortran/gfortran.h|   12 +-
 gcc/fortran/match.h   |2 +
 gcc/fortran/openmp.cc |  460 -
 gcc/fortran/parse.cc  |   52 +-
 gcc/fortran/resolve.cc|6 +
 gcc/fortran/st.cc |2 +
 gcc/fortran/trans-openmp.cc   |  187 +-
 gcc/fortran/trans.cc  |2 +
 gcc/gimple-pretty-print.cc|6 +
 gcc/gimple.h  |1 +
 gcc/gimplify.cc   |   79 +-
 gcc/omp-general.cc|   22 +-
 gcc/omp-general.h |1 +
 gcc/omp-low.cc|6 +-
 gcc/omp-transform-loops.cc| 1773 +
 gcc/params.opt|9 +
 gcc/passes.def|1 +
 .../loop-transforms/imperfect-loop-nest.c |   12 +
 .../gomp/loop-transforms/tile-1.c |  164 ++
 .../gomp/loop-transforms/tile-2.c |  183 ++
 .../gomp/loop-transforms/tile-3.c |  117 ++
 .../gomp/loop-transforms/tile-4.c |  322 +++
 .../gomp/loop-transforms/tile-5.c |  150 ++
 .../gomp/loop-transforms/tile-6.c |   34 +
 .../gomp/loop-transforms/tile-7.c |   31 +
 .../gomp/loop-transforms/tile-8.c |   40 +
 .../gomp/loop-transforms/unroll-1.c   |  133 ++
 .../gomp/loop-transforms/unroll-2.c   |   95 +
 .../gomp/loop-transforms/unroll-3.c   |   18 +
 .../gomp/loop-transforms/unroll-4.c   |   19 +
 .../gomp/loop-transforms/unroll-5.c   |   19 +
 .../gomp/loop-transforms/unroll-6.c   |   20 +
 .../gomp/loop-transforms/unroll-7.c   |  144 ++
 .../gomp/loop-transforms/unroll-inner-1.c |   15 +
 .../gomp/loop-transforms/unroll-inner-2.c |   31 +
 .../gomp/loop-transforms/unroll-non-rect-1.c  |   37 +
 .../gomp/loop-transforms/unroll-non-rect-2.c  |   22 +
 .../gomp/loop-transforms/unroll-simd-1.c  |   84 +
 .../g++.dg/gomp/loop-transforms/tile-1.h  |   27 +
 .../g++.dg/gomp/loop-transforms/tile-1a.C |   27 +
 .../g++.

[PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE

2023-03-24 Thread Frederik Harwath
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".

gcc/ChangeLog:

* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
(OMP_CLAUSE_OACC_TILE_LIST): ... this.
(OMP_CLAUSE_TILE_ITERVAR): Rename to ...
(OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
(OMP_CLAUSE_TILE_COUNT): Rename to ...
(OMP_CLAUSE_OACC_TILE_COUNT): this.
* gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_for): Likewise.
* omp-general.cc (omp_extract_for_data): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_oacc_head_mark): Likewise.
* tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.cc: Likewise.

gcc/c-family/ChangeLog:

* c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
(c_parser_oacc_clause_tile): Likewise.
(c_parser_omp_for_loop): Likewise.
* c-typeck.cc (c_finish_omp_clauses): Likewise.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
(cp_parser_omp_clause_collapse): Likewise.
(cp_parser_omp_for_loop): Likewise.
* pt.cc (tsubst_omp_clauses): Likewise.
* semantics.cc (finish_omp_clauses): Likewise.
(finish_omp_for): Likewise.

gcc/fortran/ChangeLog:

* openmp.cc (enum omp_mask2): Adjust to renamings.
(gfc_match_omp_clauses): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
---
 gcc/c-family/c-omp.cc   |  2 +-
 gcc/c/c-parser.cc   | 12 ++--
 gcc/c/c-typeck.cc   |  2 +-
 gcc/cp/parser.cc| 12 ++--
 gcc/cp/pt.cc|  2 +-
 gcc/cp/semantics.cc |  8 
 gcc/fortran/openmp.cc   |  6 +++---
 gcc/fortran/trans-openmp.cc |  4 ++--
 gcc/gimplify.cc |  8 
 gcc/omp-general.cc  |  8 
 gcc/omp-low.cc  |  6 +++---
 gcc/tree-core.h |  2 +-
 gcc/tree-nested.cc  |  4 ++--
 gcc/tree-pretty-print.cc|  4 ++--
 gcc/tree.cc |  2 +-
 gcc/tree.h  | 12 ++--
 16 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index 85ba9c528c8..fec7f337772 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -1749,7 +1749,7 @@ c_oacc_split_loop_clauses (tree clauses, tree 
*not_loop_clauses,
 {
  /* Loop clauses.  */
case OMP_CLAUSE_COLLAPSE:
-   case OMP_CLAUSE_TILE:
+   case OMP_CLAUSE_OACC_TILE:
case OMP_CLAUSE_GANG:
case OMP_CLAUSE_WORKER:
case OMP_CLAUSE_VECTOR:
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9d875befccc..e7c9da99552 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -14183,7 +14183,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tree 
list)
   location_t loc;

   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");
-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");

   loc = c_parser_peek_token (parser)->location;
   matching_parens parens;
@@ -15349,7 +15349,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   location_t loc;
   tree tile = NULL_TREE;

-  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+  check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile");
   check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse");

   loc = c_parser_peek_token (parser)->location;
@@ -15401,9 +15401,9 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list)
   /* Consume the trailing ')'.  */
   c_parser_consume_token (parser);

-  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  c = build_omp_clause (loc, OMP_CLAUSE_OACC_TILE);
   tile = nreverse (tile);
-  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_OACC_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
   return c;
 }
@@ -20270,10 +20270,10 @@ c_parser_omp_for_loop (location_t loc, c_parser 
*parser, enum tree_code code,
   for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl))
 if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE)
   collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl));
-else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE)
+else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE)
   {
tiling = true;
-   collapse = list_length (OMP_CLAUSE_TILE_LIST (cl));
+   collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl));
   }
 else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED
 && OMP_CLAUSE_ORDERED_EXPR (cl))
diff --git a/gcc/

[PATCH 4/7] openmp: Add Fortran support for "omp tile"

2023-03-24 Thread Frederik Harwath
This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
(enum gfc_exec_op): Add EXEC_OMP_TILE.
(loop_transform_p): New declaration.
(struct gfc_omp_clauses): Add "tile_sizes" field.
* dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
(show_omp_node): Handle EXEC_OMP_TILE.
(show_code_node): Likewise.
* match.h (gfc_match_omp_tile): New declaration.
* openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
(match_tile_sizes): New function.
(OMP_TILE_CLAUSES): New macro.
(gfc_match_omp_tile): New function.
(resolve_omp_do): Handle EXEC_OMP_TILE.
(resolve_omp_tile): New function.
(omp_code_to_statement): Handle EXEC_OMP_TILE.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
and ST_OMP_TILE.
(next_statement): Handle ST_OMP_TILE.
(gfc_ascii_statement): Likewise.
(parse_omp_do): Likewise.
(parse_executable): Likewise.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
(loop_transform_p): New function.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Handle EXEC_OMP_TILE.
(gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.

gcc/ChangeLog:

* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 +-
 gcc/fortran/gfortran.h|   7 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 373 +-
 gcc/fortran/parse.cc  |  15 +
 gcc/fortran/resolve.cc|   3 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-openmp.cc   |  86 ++--
 gcc/fortran/trans.cc  |   1 +
 gcc/gimplify.cc   |   3 +
 gcc/omp-general.cc|   2 +-
 gcc/omp-transform-loops.cc| 340 +++-
 .../gomp/loop-transforms/tile-1.f90   | 163 
 .../gomp/loop-transforms/tile-1a.f90  |  10 +
 .../gomp/loop-transforms/tile-2.f90   |  80 
 .../gomp/loop-transforms/tile-3.f90   |  18 +
 .../gomp/loop-transforms/tile-4.f90   |  95 +
 .../gomp/loop-transfor

[PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops

2023-03-24 Thread Frederik Harwath
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops.  The transformation clauses are extended to
carry the information about the level of the loop nest at which a
transformation should be applied.  The middle end transformation pass
is adjusted to apply the transformations at the correct level of a
loop nest and to take their effect on the loop nest depth into
account.

gcc/fortran/ChangeLog:

* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.

gcc/ChangeLog:

* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be 
transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.

gcc/cp/ChangeLog

* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.

gcc/c/ChangeLog

* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.

libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
---
 gcc/c/c-parser.cc |  10 +-
 gcc/cp/parser.cc  |  12 +-
 gcc/cp/pt.cc  |  12 +-
 gcc/fortran/openmp.cc | 173 --
 gcc/fortran/trans-openmp.cc   |  74 ++--
 gcc/omp-transform-loops.cc| 1

[PATCH 00/40] OpenACC "kernels" Improvements

2021-12-15 Thread Frederik Harwath
Hi,
this patch series implements the re-work of the OpenACC "kernels"
implementation that has been announced at the GNU Tools Track of this
year's Linux Plumbers Conference; see
https://linuxplumbersconf.org/event/11/contributions/998/.  Versions
of the patches have also been committed to the devel/omp/gcc-11 branch
recently.

The patch series contains middle-end changes that modify the "kernels"
loop handling to use Graphite for dependence analysis of loops in
"kernels" regions, as well as new optimizations and adjustments to
existing optimizations to support this analysis. A central step is
contained in the commit titled "openacc: Use Graphite for dependence
analysis in \"kernels\" regions" whose commit message also contains
further explanations. There are also front end changes (cf. the
patches by Sandra Loosemore) that prepare the loops in "kernels"
regions for the middle-end processing and which lift various
restrictions on "kernels" regions.  I have included some dependences
(the patches by Julian Brown) from the devel/omp/gcc-11 branch which
will be re-submitted independently for review.

I have bootstrapped the compiler on x86_64-linux-gnu and performed
comprehensive testing on a powerpc64le-linux-gnu target.  The patches
should apply cleanly on commit r12-4865 of the master branch.

I am aware that we cannot incorporate those patches into GCC at the
current development stage. I hope that we can discuss some of the
changes before they can be considered for inclusion in GCC during the
next stage 1.

Best regards,
Frederik


Andrew Stubbs (2):
  openacc: Add data optimization pass
  openacc: Add runtime a lias checking for OpenACC kernels

Frederik Harwath (20):
  Fortran: Delinearize array accesses
  openacc: Move pass_oacc_device_lower after pass_graphite
  graphite: Extend SCoP detection dump output
  graphite: Rename isl_id_for_ssa_name
  graphite: Fix minor mistakes in comments
  Move compute_alias_check_pairs to tree-data-ref.c
  graphite: Add runtime alias checking
  openacc: Use Graphite for dependence analysis in "kernels" regions
  openacc: Add "can_be_parallel" flag info to "graph" dumps
  openacc: Remove unused partitioning in "kernels" regions
  Add function for printing a single OMP_CLAUSE
  openacc: Warn about "independent" "kernels" loops with
data-dependences
  openacc: Handle internal function calls in pass_lim
  openacc: Disable pass_pre on outlined functions analyzed by Graphite
  graphite: Tune parameters for OpenACC use
  graphite: Adjust scop loop-nest choice
  graphite: Accept loops without data references
  openacc: Enable reduction variable localization for "kernels"
  openacc: Check type for references in reduction lowering
  openacc: Adjust testsuite to new "kernels" handling

Julian Brown (4):
  Reference reduction localization
  Fix tree check failure with reduction localization
  Use more appropriate var in localize_reductions call
  Handle references in OpenACC "private" clauses

Sandra Loosemore (12):
  Kernels loops annotation: C and C++.
  Add -fno-openacc-kernels-annotate-loops option to more testcases.
  Kernels loops annotation: Fortran.
  Additional Fortran testsuite fixes for kernels loops annotation pass.
  Fix bug in processing of array dimensions in data clauses.
  Add a "combined" flag for "acc kernels loop" etc directives.
  Annotate inner loops in "acc kernels loop" directives (C/C++).
  Annotate inner loops in "acc kernels loop" directives (Fortran).
  Permit calls to builtins and intrinsics in kernels loops.
  Fix patterns in Fortran tests for kernels loop annotation.
  Clean up loop variable extraction in OpenACC kernels loop annotation.
  Relax some restrictions on the loop bound in kernels loop annotation.

Tobias Burnus (2):
  Fix for is_gimple_reg vars to 'data kernels'
  openacc: fix privatization of by-reference arrays

 gcc/Makefile.in   |   2 +
 gcc/c-family/c-common.h   |   1 +
 gcc/c-family/c-omp.c  | 915 +++--
 gcc/c-family/c.opt|   8 +
 gcc/c/c-decl.c|  28 +
 gcc/c/c-parser.c  |   3 +
 gcc/cfgloop.c |   1 +
 gcc/cfgloop.h |   6 +
 gcc/cfgloopmanip.c|   1 +
 gcc/common.opt|   9 +
 gcc/config/nvptx/nvptx.c  |   7 +
 gcc/cp/decl.c |  44 +
 gcc/cp/parser.c   |   3 +
 gcc/cp/semantics.c|   9 +
 gcc/doc/gimple.texi   |   2 +
 gcc/doc/invoke.texi   |  52 +-
 gcc/doc/passes.texi   

[PATCH 03/40] Kernels loops annotation: Fortran.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

Co-Authored-By: Gergö Barany 

gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.c: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
---
 gcc/fortran/gfortran.h|   1 +
 gcc/fortran/lang.opt  |   8 +
 gcc/fortran/openmp.c  | 364 ++
 gcc/fortran/parse.c   |   9 +
 .../goacc/classify-kernels-unparallelized.f95 |   1 +
 .../gfortran.dg/goacc/classify-kernels.f95|   1 +
 .../gfortran.dg/goacc/common-block-3.f90  |   1 +
 .../gfortran.dg/goacc/kernels-loop-2.f95  |   1 +
 .../goacc/kernels-loop-annotation-1.f95   |  33 ++
 .../goacc/kernels-loop-annotation-10.f95  |  32 ++
 .../goacc/kernels-loop-annotation-11.f95  |  34 ++
 .../goacc/kernels-loop-annotation-12.f95  |  39 ++
 .../goacc/kernels-loop-annotation-13.f95  |  38 ++
 .../goacc/kernels-loop-annotation-14.f95  |  35 ++
 .../goacc/kernels-loop-annotation-15.f95  |  35 ++
 .../goacc/kernels-loop-annotation-16.f95  |  34 ++
 .../goacc/kernels-loop-annotation-2.f95   |  32 ++
 .../goacc/kernels-loop-annotation-3.f95   |  33 ++
 .../goacc/kernels-loop-annotation-4.f95   |  34 ++
 .../goacc/kernels-loop-annotation-5.f95   |  35 ++
 .../goacc/kernels-loop-annotation-6.f95   |  34 ++
 .../goacc/kernels-loop-annotation-7.f95   |  48 +++
 .../goacc/kernels-loop-annotation-8.f95   |  50 +++
 .../goacc/kernels-loop-annotation-9.f95   |  34 ++
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |   1 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |   1 +
 .../goacc/kernels-loop-data-enter-exit.f95|   1 +
 .../goacc/kernels-loop-data-update.f95|   1 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |   1 +
 .../gfortran.dg/goacc/kernels-loop-n.f95  |   1 +
 .../gfortran.dg/goacc/kernels-loop.f95|   1 +
 .../kernels-parallel-loop-data-enter-exit.f95 |   1 +
 32 files changed, 974 insertions(+)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1

[PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-03-27  Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
line numbering.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Add
-fno-openacc-kernels-annotate-loops.
---
 .../gfortran.dg/goacc/classify-kernels-unparallelized.f95| 5 +++--
 gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 5 +++--
 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95  | 1 +
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git 
a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 2ceae2088070..00aac9aa94ea 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -23,8 +23,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC seq loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "note: beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 24 }
  c(i) = a(f (i)) + b(f (i))
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index d061a241074b..ba815319abf2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -19,8 +19,9 @@ program main

   call setup(a, b)

-  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message 
"optimized: assigned OpenACC gang loop parallelism" }
-  do i = 0, n - 1
+  !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
+  do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "beginning .parloops. part in OpenACC 
.kernels. region" "" { target *-*-* } 20 }
  c(i) = a(i) + b(i)
   end do
   !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 238482b91a49..04c998d11dad 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,5 +1,6 @@
 ! Test OpenACC 'kernels' construct decomposition.

+! { dg-additional-options "-fno-openacc-kernels-annotate-loops" }
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
--
2.33.0

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

2020-08-19  Sandra Loosemore  

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED.  Update all call sites.
---
 gcc/c/c-parser.c   |  3 +++
 gcc/cp/parser.c|  3 +++
 gcc/fortran/trans-openmp.c | 34 +-
 gcc/tree.h |  5 +
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 80dd61d599ef..1258b48693de 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, 
char *p_name,
   tree block = c_begin_compound_stmt (true);
   tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL,
 if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   block = c_end_compound_stmt (loc, block, true);
   add_stmt (block);

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 4c2075742d6a..c834d25b028f 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
 omp_clause_mask mask, tree *cclauses, bool *if_p)
 {
   bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1;
+  bool is_combined = (cclauses != NULL);

   strcat (p_name, " loop");
   mask |= OACC_LOOP_CLAUSE_MASK;
@@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token 
*pragma_tok, char *p_name,
   tree block = begin_omp_structured_block ();
   int save = cp_parser_begin_omp_structured_block (parser);
   tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p);
+  if (stmt && stmt != error_mark_node)
+OACC_LOOP_COMBINED (stmt) = is_combined;
   cp_parser_end_omp_structured_block (parser, save);
   add_stmt (finish_omp_structured_block (block));

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index e81c5588c53c..618e106791e5 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -4855,7 +4855,8 @@ typedef struct dovar_init_d {

 static tree
 gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock,
- gfc_omp_clauses *do_clauses, tree par_clauses)
+ gfc_omp_clauses *do_clauses, tree par_clauses,
+ bool combined)
 {
   gfc_se se;
   tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls;
@@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, 
stmtblock_t *pblock,
 case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break;
 case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break;
 case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break;
-case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break;
+case EXEC_OACC_LOOP:
+  stmt = make_node (OACC_LOOP);
+  OACC_LOOP_COMBINED (stmt) = combined;
+  break;
 default: gcc_unreachable ();
 }

@@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pblock = █
   else
 pushlevel ();
-  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL);
+  stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL,
+  true);
   protected_set_expr_location (stmt, loc);
   if (TREE_CODE (stmt) != BIND_EXPR)
 stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
@@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t 
*pblock,
 omp_do_clauses
   = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_DO], code->loc);
   body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : &block,
-  &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses);
+  &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (body) != BIND_EXPR)
@@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, 
stmtblock_t *pblock,
 }
   stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO,
   new_pblock, &clausesa[GFC_OMP_SPLIT_DO],
-  omp_clauses);
+  omp_clauses, false);
   if (pblock == NULL)
 {
   if (TREE_CODE (stmt) != BIND_EXPR)
@@ -6496,7 +6501,8 @@ 

[PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran).

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  

gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
---
 gcc/fortran/openmp.c  | 50 ++-
 .../gfortran.dg/goacc/combined-directives.f90 | 19 +--
 .../goacc/kernels-loop-annotation-18.f95  | 28 +++
 .../goacc/kernels-loop-annotation-19.f95  | 29 +++
 .../goacc/private-explicit-kernels-1.f95  |  7 ++-
 .../goacc/private-predetermined-kernels-1.f95 |  7 ++-
 6 files changed, 131 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 243b5e0a9ac6..b0b68b494778 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,

case EXEC_OACC_PARALLEL_LOOP:
case EXEC_OACC_PARALLEL:
-   case EXEC_OACC_KERNELS_LOOP:
case EXEC_OACC_LOOP:
  /* Do not try to add automatic OpenACC annotations inside manually
 annotated loops.  Presumably, the user avoided doing it on
@@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code 
*parent,
}
  break;

+   case EXEC_OACC_KERNELS_LOOP:
+ /* This is a combined "acc kernels loop" directive.  We want to
+leave the outer loop alone but try to annotate any nested
+loops in the body.  The expected structure nesting here is
+  EXEC_OACC_KERNELS_LOOP
+EXEC_OACC_KERNELS_LOOP
+  EXEC_DO
+EXEC_DO
+  ...body...  */
+ if (code->block)
+   /* Might be empty?  */
+   {
+ gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP);
+ gfc_omp_clauses *clauses = code->ext.omp_clauses;
+ int collapse = clauses->collapse;
+ gfc_expr_list *tile = clauses->tile_list;
+ gfc_code *inner = code->block->next;
+
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+
+ /* We need to skip over nested loops covered by "collapse" or
+"tile" clauses.  "Tile" takes precedence
+(see gfc_trans_omp_do).  */
+ if (tile)
+   {
+ collapse = 0;
+ for (gfc_expr_list *el = tile; el; el = el->next)
+   collapse++;
+   }
+ if (clauses->orderedc)
+   collapse = clauses->orderedc;
+ if (collapse <= 0)
+   collapse = 1;
+ for (int i = 1; i < collapse; i++)
+   {
+ gcc_assert (inner->op == EXEC_DO);
+ gcc_assert (inner->block->op == EXEC_DO);
+ inner = inner->block->next;
+   }
+ if (inner)
+   /* Loop might have empty body?  */
+   annotate_do_loops_in_kernels (inner->block->next,
+ inner, goto_targets,
+ as_in_kernels_region);
+   }
+ walk_block = false;
+ break;
+
case EXEC_DO_WHILE:
case EXEC_DO_CONCURRENT:
  /* Traverse the body in a special state to allow EXIT statements
diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 
b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 956349204f4d..562a4e40cd7d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -139,10 +139,21 @@ end subroutine test

 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. 
collapse.2." 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } }
-! { dg-final { scan-tree-dump-times "acc loo

[PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  

gcc/c-family/
* c-omp.c (annotate_loops_in_kernels_regions): Test for
calls to builtins.

gcc/fortran/
* openmp.c (check_expr_for_invalid_calls): Check for intrinsic
functions.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
---
 gcc/c-family/c-omp.c  | 10 ---
 gcc/fortran/openmp.c  |  9 ---
 .../goacc/kernels-loop-annotation-20.c| 23 
 .../goacc/kernels-loop-annotation-20.f95  | 26 +++
 4 files changed, 61 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
 create mode 100644 
gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95

diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c
index 30757877eafe..e7c27f45e888 100644
--- a/gcc/c-family/c-omp.c
+++ b/gcc/c-family/c-omp.c
@@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
   break;

 case CALL_EXPR:
-  /* Direct function calls to functions marked as OpenACC routines are
-allowed.  Reject indirect calls or calls to non-routines.  */
+  /* Direct function calls to builtins and functions marked as
+OpenACC routines are allowed.  Reject indirect calls or calls
+to non-routines.  */
   if (info->state >= as_in_kernels_loop)
{
  tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE;
@@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int 
*walk_subtrees,
}
  if (fn_decl == NULL_TREE)
do_not_annotate_loop_nest (info, as_invalid_call, node);
- else if (!lookup_attribute ("oacc function",
- DECL_ATTRIBUTES (fn_decl)))
+ else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL)
+  && !lookup_attribute ("oacc function",
+DECL_ATTRIBUTES (fn_decl)))
do_not_annotate_loop_nest (info, as_invalid_call, node);
}
   break;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index b0b68b494778..d5d996e378d7 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int 
*walk_subtrees,
   switch (expr->expr_type)
 {
 case EXPR_FUNCTION:
-  if (expr->value.function.esym
- && (expr->value.function.esym->attr.oacc_routine_lop
- != OACC_ROUTINE_LOP_NONE))
+  /* Permit calls to Fortran intrinsic functions and to routines
+with an explicitly declared parallelism level.  */
+  if (expr->value.function.isym
+ || (expr->value.function.esym
+ && (expr->value.function.esym->attr.oacc_routine_lop
+ != OACC_ROUTINE_LOP_NONE)))
return 0;
   /* Else fall through.  */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
new file mode 100644
index ..5e3f02845713
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c
@@ -0,0 +1,23 @@
+/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */
+/* { dg-additional-options "-fdump-tree-original" } */
+/* { dg-do compile } */
+
+/* Test that calls to built-in functions don't inhibit kernels loop
+   annotation.  */
+
+void foo (int n, int *input, int *out1, int *out2)
+{
+#pragma acc kernels
+  {
+int i;
+
+for (i = 0; i < n; i++)
+  {
+   out1[i] = __builtin_clz (input[i]);
+   out2[i] = __builtin_popcount (input[i]);
+  }
+  }
+}
+
+/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
new file mode 100644
index ..5169a0a1676d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95
@@ -0,0 +1,26 @@
+! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-Wopenacc-kernels-annotate-loops" }
+! { dg-additional-options "-fdump-tree-original" }
+! { dg-do compile }
+
+! Test that a loop with calls to intri

[PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation.

2021-12-15 Thread Frederik Harwath
From: Sandra Loosemore 

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file.  Now the "auto" clause appears first, instead of after "private".

2020-08-23   Sandra Loosemore  

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
---
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95  | 2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95  | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
index 41f6307dbb17..42e751dbfb83 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95
@@ -30,4 +30,4 @@ subroutine f (a, b, c)
 !$acc end kernels
 end subroutine f

-! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 
"original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
index d51482e4685d..6e2e2c41172b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95
@@ -31,4 +31,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
index 3c4956d70775..03c4234ce7cd 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95
@@ -36,4 +36,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
index 3ec459f0a8df..6aeb3f2fe4d0 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95
@@ -35,4 +35,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
index 91f431cca432..7d1cff64a3d9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95
@@ -32,4 +32,4 @@ function f (a, b)

 end function f

-! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } }
+! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } }
diff --git a

[PATCH 13/40] Fortran: Delinearize array accesses

2021-12-15 Thread Frederik Harwath
The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

Co-Authored-By: Tobias Burnus 

gcc/ChangeLog:

* expr.c (get_inner_reference): Handle NOP_EXPR.

gcc/fortran/ChangeLog:

* lang.opt: Document -param=delinearize.
* trans-array.c: (get_class_array_vptr): New function.
(get_array_lbound): New function.
(get_array_ubound): New function.
(gfc_conv_array_ref): Implement main delinearization logic.
(build_array_ref): Adjust.

gcc/testsuite/ChangeLog:

* gfortran.dg/assumed_type_2.f90: Adjust test expectations.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/gomp/affinity-clause-1.f90: Likewise.
* gfortran.dg/graphite/block-2.f: Likewise.
* gfortran.dg/graphite/block-3.f90: Likewise.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/graphite/id-9.f: Likewise.
* gfortran.dg/inline_matmul_16.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Likewise.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Likewise.
* gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise.
---
 gcc/expr.c|   1 +
 gcc/fortran/lang.opt  |   4 +
 gcc/fortran/trans-array.c | 321 +-
 gcc/testsuite/gfortran.dg/assumed_type_2.f90  |   6 +-
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |   2 +-
 .../gfortran.dg/gomp/affinity-clause-1.f90|   2 +-
 gcc/testsuite/gfortran.dg/graphite/block-2.f  |   9 +-
 .../gfortran.dg/graphite/block-3.f90  |   2 +-
 .../gfortran.dg/graphite/block-4.f90  |   2 +-
 gcc/testsuite/gfortran.dg/graphite/id-9.f |   2 +-
 .../gfortran.dg/inline_matmul_16.f90  |   2 +
 .../gfortran.dg/inline_matmul_24.f90  |   2 +-
 gcc/testsuite/gfortran.dg/no_arg_check_2.f90  |   6 +-
 gcc/testsuite/gfortran.dg/pr32921.f   |   2 +-
 gcc/testsuite/gfortran.dg/reassoc_4.f |   2 +-
 .../gfortran.dg/vect/fast-math-mgrid-resid.f  |   1 +
 16 files changed, 270 insertions(+), 96 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index eb33643bd770..188905b4fe4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize,
  break;

case VIEW_CONVERT_EXPR:
+   case NOP_EXPR:
  break;

case MEM_REF:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index a202c04c4a25..25c5a5a32c41 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -521,6 +521,10 @@ fdefault-real-16
 Fortran Var(flag_default_real_16)
 Set the default real kind to an 16 byte wide type.

+-param=delinearize=
+Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) 
Param Optimization
+Delinearize array references.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 5ceb261b6989..e84b4cb55f05 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t)
 }
 }

-
 static tree
-build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+get_class_array_vptr (tree desc, tree vptr)
 {
-  tree tmp;
   tree type;
   tree cdesc;

@@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, 
tree vptr)
  && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type)))
vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0));
 }
+  return vptr;
+}

+static tree
+build_array_ref (tree desc, tree offset, tree decl, tree vptr)
+{
+  tree tmp;
+  vptr = get_class_array_vptr (desc, vptr);
   tmp = gfc_conv_array_data (desc);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   tmp = gfc_build_array_ref (tmp, offset, decl, vptr);
   return tmp;
 }

+/* Get the declared lower bound for rank 

Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive

2023-04-06 Thread Frederik Harwath via Fortran

Hi Thomas,

On 01.04.23 10:42, Thomas Schwinge wrote:

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):
   

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution 
test



Thank you for reporting the failures! They are caused by mistakes in the 
test code, not the implementation. I have attached a patch which fixes 
the failures.


I have been able to reproduce the failures with -m32. With the patch 
they went away, even with 100 of repeated test executions ;-).



Best regards,

Frederik
From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 6 Apr 2023 14:52:07 +0200
Subject: [PATCH] openmp: Fix loop transformation tests

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
	* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
	* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
	and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j
 
 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions
 
 integer :: i,j
 
-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
-- 
2.36.1



Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-16 Thread Frederik Harwath via Fortran

Hi Jakub,

On 15.05.23 12:19, Jakub Jelinek wrote:

On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote:

this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.

I'm afraid we can't do it this way, at least not completely.

The OpenMP requirements and what is being discussed for further loop
transformations pretty much requires parts of it to be done as soon as possible.
My understanding is that that is where other implementations implement that
too and would also prefer GCC not to be the only implementation that takes
significantly different decision in that case from other implementations


The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two 
compilers (another one and GCC with this patch series) transformed the 
loops in the middle end after the handling of data sharing, one planned 
to do so. Yet another vendor had not yet decided where it will be 
implemented. Clang currently does everything in the front end, but it 
was mentioned that this might change in the future e.g. for code sharing 
with Flang. Implementing the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments of the data sharing clauses, but this is known and 
consequentially, no such transformations are planned for OpenMP 6.0. In 
particular, the "apply" clause therefore only permits loop-transforming 
constructs to be applied to the loops generated from other loop

transformations in TR11.


The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
already need to know given their collapse/ordered how many loops they are
actually associated with and the loop transformation constructs can change
that.
So, I think we need to do the loop transformations in the FEs, that doesn't
mean we need to write everything 3 times, once for each frontend.
Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
though how much can be shared between c-family and Fortran is to be
discovered.
Or at least partially, to the extent that we compute how many canonical
loops the loop transformations result in, what artificial iterators they
will use etc., so that during gimplification we can take all that into
account and then can do the actual transformations later.


The patches in this patch series already do compute how many canonical
loop nests result from the loop transformations in the front end.
This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the
loop transformations may also appear on inner loops of a collapsed
loop nest (i.e. within the collapsed depth), representing the
transformation by OMP_FOR-like constructs would imply that a collapsed
loop nest would have to be broken apart into single loops. Perhaps this
could be handled somehow, but the collapsed loop nest would have to be
re-assembled to meet the expectations of e.g. gimplification.
The clause representation is also much better suited for the upcoming
OpenMP "apply" clause where the transformations will not appear
as directives in front of actual loops but inside of other clauses.
In fact, the loop transformation clauses in the implementation already
specify the level of a loop nest to which they apply and it could
be possible to re-use this handling for "apply".

My initial reaction also was to implement the loop transformations
as OMP_FOR-like constructs and the patch actually introduces an
OMP_LOOP_TRANS construct which is used to represent loops that
are not going to be associated with another OpenMP directive after
the transformation, e.g.

void foo () {
  #pragma omp tile sizes (4, 8, 16)
  for (int i = 0; i < 64; ++i)
  {
...
  }

}

You suggest to implement the loop transformations during gimplification.
I am not sure if gimplification is actually well-suited to implement the 
depth-first evaluation of the loop transformations. I also believe that 
gimplification already handles too many things which conceptually are 
not related to the translation to GIMPLE. Having a separate pass seems 
to be the right move to achieve a better separation of concerns. I think 
this will be even more important in the future as the size of the loop 
transformation implementation keeps growing. As you mention below, 
several new constructs are already planned.

Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-17 Thread Frederik Harwath via Fortran

Hi Jakub,

On 16.05.23 13:00, Jakub Jelinek wrote:

On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:

The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two
compilers (another one and GCC with this patch series) transformed 
the loops
in the middle end after the handling of data sharing, one planned to 
do so.
Yet another vendor had not yet decided where it will be implemented. 
Clang
currently does everything in the front end, but it was mentioned that 
this

might change in the future e.g. for code sharing with Flang. Implementing
the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments
of the data sharing clauses, but this is known and consequentially, 
no such

When already in the FE we determine how many canonical loops a particular
loop transformation creates, I think the primary changes I'd like to 
see is
really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and 
consider

where is the best spot to lower it. I believe for data sharing it is best
done during gimplification before the containing loops are handled, it is
already shared code among all the FEs, I think will make it easier to 
handle

data sharing right and gimplification is also where doacross processing is
done. While there is restriction that ordered clause is incompatible with
generated loops from tile construct, there isn't one for unroll (unless
"The ordered clause must not appear on a worksharing-loop directive if 
the associated loops

include the generated loops of a tile directive."
means unroll partial implicitly because partial unroll tiles the loop, but
it doesn't say it acts as if it was a tile construct), so we'd have to 
handle

#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
#pragma omp unroll partial(4)
for (int j = 0; j < 64; j++)
{
#pragma omp ordered depend (sink: i - 1, j - 2)
#pragma omp ordered depend (source)
}
and I think handling it after gimplification is going to be increasingly
harder. Of course another possibility is ask lang committee to clarify
unless it has been clarified already in 6.0 (but in TR11 it is not).


I do not really expect that we will have to handle this. Questions 
concerning

the correctness of code after applying loop transformations came up several
times since I have been following the design meetings and the result was
always either that nothing will be changed, because the loop transformations
are not expected to ensure the correctness of enclosing directives, or that
the use of the problematic construct in conjunction with loop 
transformations

will be forbidden. Concerning the use of "ordered" on transformed loops, the
latter approach was suggested for all transformations, cf. issue #3494 
in the
private OpenMP spec repository. I see that you have already asked for 
clarification

on unroll. I suppose this could also be fixed after gimplification with
reasonable effort. But let's just wait for the result of that discussion 
before we

continue worrying about this.


Also, I think creating temporaries is easier to be done during
gimplification than later.


This has not caused problems with the current approach.


Another option is as you implemented a separate pre-omp-lowering pass,
and another one would be do it in the omplower pass, which has actually
several subpasses internally, do it in the scan phase. Disadvantage of
a completely separate pass is that we have to walk the whole IL again,
while doing it in the scan phase means we avoid that cost. We already
do there similar transformations, scan_omp_simd transforms simd constructs
into if (...) simd else simt and then we process it with normal 
scan_omp_for

on what we've created. So, if you insist doing it after gimplification
perhaps for compatibility with other non-LLVM compilers, I'd prefer to
do it there rather than in a completely separate pass.


I see. This would be possible. My current approach is indeed rather
wasteful because the pass is not restricted to functions that actually
use loop transformations. I could add an attribute to such functions
that could be used to avoid the execution of the pass and hence
the gimple walk on functions that do not use transformations.


This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the

I really don't see why. We try to represent what we see in the source
as OpenMP constructs as those constructs. We already have a precedent
with composite loop constructs, where for the combined constr