[PATCH 40/40] openacc: Adjust testsuite to new "kernels" handling
Adjust the testsuite to changed expectations with the new Graphite-based "kernels" handling. libgomp/ChangeLog: * testsuite/libgomp.oacc-c++/privatized-ref-2.C: Adjust. * testsuite/libgomp.oacc-c++/privatized-ref-3.C: Adjust. * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-4.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-local-worker-5.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-4.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-5.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-gang-6.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-vector-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-4.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-5.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-6.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/kernels-private-vars-loop-worker-7.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust. * testsuite/libgomp.oacc-fortran/if-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-2.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-3.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-gang-6.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-vector-2.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-2.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-3.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-4.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-vars-loop-worker-5.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-private-
[PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
Hi, this patch series implements the OpenMP 5.1 "unroll" and "tile" constructs. It includes changes to the C,C++, and Fortran front end for parsing the new constructs and a new middle-end "omp_transform_loops" pass which implements the transformations in a source language agnostic way. The "unroll" and "tile" directives are internally implemented as clauses. This fits the representation of collapsed loop nests by a single internal gomp_for construct. Loop transformations can be applied to loops at the different levels of such a loop nest and this can be represented well with the clause representation. The transformations can also be applied to loops which are not going to be associated with any OpenMP directive after the transformation. This is represented by a new gomp_for kind. Loops of this kind are lowered in the transformation pass since they are not subject to any further OpenMP-specific processing. The patches are roughly presented in the order of their development: Each construct is implemented in the Fortran front end first including the middle-end additions/changes, followed by a patch that adds the C and C++ front end changes. This initial implementation supports the loop transformation constructs on the outermost loop of a loop nest only. The support for applying the transformations to inner loops is then added in two further patches. The patches have been bootstrapped and tested on x86_64-linux-gnu with both nvptx-none and amdgcn-amdhsa offloading. Best regards, Frederik Frederik Harwath (7): openmp: Add Fortran support for "omp unroll" directive openmp: Add C/C++ support for "omp unroll" directive openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE openmp: Add Fortran support for "omp tile" openmp: Add C/C++ support for "omp tile" openmp: Add Fortran support for loop transformations on inner loops openmp: Add C/C++ support for loop transformations on inner loops gcc/Makefile.in |1 + gcc/c-family/c-gimplify.cc|1 + gcc/c-family/c-omp.cc | 12 +- gcc/c-family/c-pragma.cc |2 + gcc/c-family/c-pragma.h |7 +- gcc/c/c-parser.cc | 403 +++- gcc/c/c-typeck.cc | 10 +- gcc/cp/cp-gimplify.cc |3 + gcc/cp/parser.cc | 453 - gcc/cp/pt.cc | 15 +- gcc/cp/semantics.cc | 104 +- gcc/fortran/dump-parse-tree.cc| 30 + gcc/fortran/gfortran.h| 12 +- gcc/fortran/match.h |2 + gcc/fortran/openmp.cc | 460 - gcc/fortran/parse.cc | 52 +- gcc/fortran/resolve.cc|6 + gcc/fortran/st.cc |2 + gcc/fortran/trans-openmp.cc | 187 +- gcc/fortran/trans.cc |2 + gcc/gimple-pretty-print.cc|6 + gcc/gimple.h |1 + gcc/gimplify.cc | 79 +- gcc/omp-general.cc| 22 +- gcc/omp-general.h |1 + gcc/omp-low.cc|6 +- gcc/omp-transform-loops.cc| 1773 + gcc/params.opt|9 + gcc/passes.def|1 + .../loop-transforms/imperfect-loop-nest.c | 12 + .../gomp/loop-transforms/tile-1.c | 164 ++ .../gomp/loop-transforms/tile-2.c | 183 ++ .../gomp/loop-transforms/tile-3.c | 117 ++ .../gomp/loop-transforms/tile-4.c | 322 +++ .../gomp/loop-transforms/tile-5.c | 150 ++ .../gomp/loop-transforms/tile-6.c | 34 + .../gomp/loop-transforms/tile-7.c | 31 + .../gomp/loop-transforms/tile-8.c | 40 + .../gomp/loop-transforms/unroll-1.c | 133 ++ .../gomp/loop-transforms/unroll-2.c | 95 + .../gomp/loop-transforms/unroll-3.c | 18 + .../gomp/loop-transforms/unroll-4.c | 19 + .../gomp/loop-transforms/unroll-5.c | 19 + .../gomp/loop-transforms/unroll-6.c | 20 + .../gomp/loop-transforms/unroll-7.c | 144 ++ .../gomp/loop-transforms/unroll-inner-1.c | 15 + .../gomp/loop-transforms/unroll-inner-2.c | 31 + .../gomp/loop-transforms/unroll-non-rect-1.c | 37 + .../gomp/loop-transforms/unroll-non-rect-2.c | 22 + .../gomp/loop-transforms/unroll-simd-1.c | 84 + .../g++.dg/gomp/loop-transforms/tile-1.h | 27 + .../g++.dg/gomp/loop-transforms/tile-1a.C | 27 + .../g++.
[PATCH 3/7] openacc: Rename OMP_CLAUSE_TILE to OMP_CLAUSE_OACC_TILE
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation construct "omp tile". gcc/ChangeLog: * tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE. * tree.h (OMP_CLAUSE_TILE_LIST): Rename to ... (OMP_CLAUSE_OACC_TILE_LIST): ... this. (OMP_CLAUSE_TILE_ITERVAR): Rename to ... (OMP_CLAUSE_OACC_TILE_ITERVAR): ... this. (OMP_CLAUSE_TILE_COUNT): Rename to ... (OMP_CLAUSE_OACC_TILE_COUNT): this. * gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_for): Likewise. * omp-general.cc (omp_extract_for_data): Likewise. * omp-low.cc (scan_sharing_clauses): Likewise. (lower_oacc_head_mark): Likewise. * tree-nested.cc (convert_nonlocal_omp_clauses): Likewise. (convert_local_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Likewise. * tree.cc: Likewise. gcc/c-family/ChangeLog: * c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings. (c_parser_oacc_clause_tile): Likewise. (c_parser_omp_for_loop): Likewise. * c-typeck.cc (c_finish_omp_clauses): Likewise. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings. (cp_parser_omp_clause_collapse): Likewise. (cp_parser_omp_for_loop): Likewise. * pt.cc (tsubst_omp_clauses): Likewise. * semantics.cc (finish_omp_clauses): Likewise. (finish_omp_for): Likewise. gcc/fortran/ChangeLog: * openmp.cc (enum omp_mask2): Adjust to renamings. (gfc_match_omp_clauses): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Likewise. --- gcc/c-family/c-omp.cc | 2 +- gcc/c/c-parser.cc | 12 ++-- gcc/c/c-typeck.cc | 2 +- gcc/cp/parser.cc| 12 ++-- gcc/cp/pt.cc| 2 +- gcc/cp/semantics.cc | 8 gcc/fortran/openmp.cc | 6 +++--- gcc/fortran/trans-openmp.cc | 4 ++-- gcc/gimplify.cc | 8 gcc/omp-general.cc | 8 gcc/omp-low.cc | 6 +++--- gcc/tree-core.h | 2 +- gcc/tree-nested.cc | 4 ++-- gcc/tree-pretty-print.cc| 4 ++-- gcc/tree.cc | 2 +- gcc/tree.h | 12 ++-- 16 files changed, 47 insertions(+), 47 deletions(-) diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc index 85ba9c528c8..fec7f337772 100644 --- a/gcc/c-family/c-omp.cc +++ b/gcc/c-family/c-omp.cc @@ -1749,7 +1749,7 @@ c_oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses, { /* Loop clauses. */ case OMP_CLAUSE_COLLAPSE: - case OMP_CLAUSE_TILE: + case OMP_CLAUSE_OACC_TILE: case OMP_CLAUSE_GANG: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc index 9d875befccc..e7c9da99552 100644 --- a/gcc/c/c-parser.cc +++ b/gcc/c/c-parser.cc @@ -14183,7 +14183,7 @@ c_parser_omp_clause_collapse (c_parser *parser, tree list) location_t loc; check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse"); - check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile"); + check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile"); loc = c_parser_peek_token (parser)->location; matching_parens parens; @@ -15349,7 +15349,7 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list) location_t loc; tree tile = NULL_TREE; - check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile"); + check_no_duplicate_clause (list, OMP_CLAUSE_OACC_TILE, "tile"); check_no_duplicate_clause (list, OMP_CLAUSE_COLLAPSE, "collapse"); loc = c_parser_peek_token (parser)->location; @@ -15401,9 +15401,9 @@ c_parser_oacc_clause_tile (c_parser *parser, tree list) /* Consume the trailing ')'. */ c_parser_consume_token (parser); - c = build_omp_clause (loc, OMP_CLAUSE_TILE); + c = build_omp_clause (loc, OMP_CLAUSE_OACC_TILE); tile = nreverse (tile); - OMP_CLAUSE_TILE_LIST (c) = tile; + OMP_CLAUSE_OACC_TILE_LIST (c) = tile; OMP_CLAUSE_CHAIN (c) = list; return c; } @@ -20270,10 +20270,10 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code, for (cl = clauses; cl; cl = OMP_CLAUSE_CHAIN (cl)) if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_COLLAPSE) collapse = tree_to_shwi (OMP_CLAUSE_COLLAPSE_EXPR (cl)); -else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_TILE) +else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_OACC_TILE) { tiling = true; - collapse = list_length (OMP_CLAUSE_TILE_LIST (cl)); + collapse = list_length (OMP_CLAUSE_OACC_TILE_LIST (cl)); } else if (OMP_CLAUSE_CODE (cl) == OMP_CLAUSE_ORDERED && OMP_CLAUSE_ORDERED_EXPR (cl)) diff --git a/gcc/
[PATCH 4/7] openmp: Add Fortran support for "omp tile"
This commit implements the Fortran front end support for the "omp tile" directive and the corresponding middle end transformation. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE. (enum gfc_exec_op): Add EXEC_OMP_TILE. (loop_transform_p): New declaration. (struct gfc_omp_clauses): Add "tile_sizes" field. * dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping. (show_omp_node): Handle EXEC_OMP_TILE. (show_code_node): Likewise. * match.h (gfc_match_omp_tile): New declaration. * openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field. (match_tile_sizes): New function. (OMP_TILE_CLAUSES): New macro. (gfc_match_omp_tile): New function. (resolve_omp_do): Handle EXEC_OMP_TILE. (resolve_omp_tile): New function. (omp_code_to_statement): Handle EXEC_OMP_TILE. (gfc_resolve_omp_directive): Likewise. * parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE and ST_OMP_TILE. (next_statement): Handle ST_OMP_TILE. (gfc_ascii_statement): Likewise. (parse_omp_do): Likewise. (parse_executable): Likewise. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE. (gfc_resolve_code): Likewise. * st.cc (gfc_free_statement): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field. (loop_transform_p): New function. (gfc_expr_list_len): New function. (gfc_trans_omp_do): Handle EXEC_OMP_TILE. (gfc_trans_omp_directive): Likewise. * trans.cc (trans_code): Likewise. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_loop): Likewise. * omp-transform-loops.cc (walk_omp_for_loops): New declaration. (subst_var_in_op): New function. (subst_var): New function. (gomp_for_number_of_iterations): Adjust. (gomp_for_iter_count_type): New function. (gimple_assign_rhs_to_tree): New function. (subst_defs): New function. (gomp_for_uncollapse): Adjust. (transformation_clause_p): Add OMP_CLAUSE_TILE. (tile): New function. (transform_gomp_for): Handle OMP_CLAUSE_TILE. (optimize_transformation_clauses): Handle OMP_CLAUSE_TILE. * omp-general.cc (omp_loop_transform_clauses_p): Add OMP_CLAUSE_TILE. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE. * tree.cc: Add OMP_CLAUSE_TILE. * tree.h (OMP_CLAUSE_TILE_SIZES): New macro. libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test. * testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test. * testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/loop-transforms/tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-2.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-3.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-4.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test. --- gcc/fortran/dump-parse-tree.cc| 17 +- gcc/fortran/gfortran.h| 7 +- gcc/fortran/match.h | 1 + gcc/fortran/openmp.cc | 373 +- gcc/fortran/parse.cc | 15 + gcc/fortran/resolve.cc| 3 + gcc/fortran/st.cc | 1 + gcc/fortran/trans-openmp.cc | 86 ++-- gcc/fortran/trans.cc | 1 + gcc/gimplify.cc | 3 + gcc/omp-general.cc| 2 +- gcc/omp-transform-loops.cc| 340 +++- .../gomp/loop-transforms/tile-1.f90 | 163 .../gomp/loop-transforms/tile-1a.f90 | 10 + .../gomp/loop-transforms/tile-2.f90 | 80 .../gomp/loop-transforms/tile-3.f90 | 18 + .../gomp/loop-transforms/tile-4.f90 | 95 + .../gomp/loop-transfor
[PATCH 6/7] openmp: Add Fortran support for loop transformations on inner loops
So far the implementation of the "omp tile" and "omp unroll" directives restricted their use to the outermost loop of a loop-nest. This commit changes the Fortran front end to parse and verify the directives on inner loops. The transformation clauses are extended to carry the information about the level of the loop nest at which a transformation should be applied. The middle end transformation pass is adjusted to apply the transformations at the correct level of a loop nest and to take their effect on the loop nest depth into account. gcc/fortran/ChangeLog: * openmp.cc (omp_unroll_removes_loop_nest): Move down in file. (resolve_loop_transform_generic): Remove, and ... (resolve_omp_unroll): ... inline and adapt here. Move function. Move functin. (find_nested_loop_in_block): New function. (find_nested_loop_in_chain): New function, used ... (is_outer_iteration_variable): ... here, and ... (expr_is_invariant): ... here. (resolve_omp_do): Adjust code for resolving loop transformations. (resolve_omp_tile): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL on new clause. (compute_transformed_depth): New function to compute the depth ("collapse") of a transformed loop nest, used (gfc_trans_omp_do): ... here. gcc/ChangeLog: * omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type in comment. (gomp_for_uncollapse): Adjust "collapse" value after uncollapse. (partial_unroll): Add argument for the loop nest level to be transformed. (tile): Likewise. (transform_gomp_for): Pass level to transformatoin functions. (optimize_transformation_clauses): Handle transformation clauses for all levels recursively. * tree-pretty-print.cc (dump_omp_clause): Print OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE. * tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE. * tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access clause operand 0. (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0. (OMP_CLAUSE_TILE_SIZES): Likewise. gcc/cp/ChangeLog * parser.cc (cp_parser_omp_clause_unroll_full): Set new OMP_CLAUSE_TRANSFORM_LEVEL operand to default value. (cp_parser_omp_clause_unroll_partial): Likewise. (cp_parser_omp_tile_sizes): Likewise. (cp_parser_omp_loop_transform_clause): Likewise. (cp_parser_omp_nested_loop_transform_clauses): Likewise. (cp_parser_omp_unroll): Likewise. * pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL and OMP_CLAUSE_TILE handling to changed number of operands. gcc/c/ChangeLog * c-parser.cc (c_parser_omp_clause_unroll_full): Set new OMP_CLAUSE_TRANSFORM_LEVEL operand to default value. (c_parser_omp_clause_unroll_partial): Likewise. (c_parser_omp_tile_sizes): Likewise. (c_parser_omp_loop_transform_clause): Likewise. (c_parser_omp_nested_loop_transform_clauses): Likewise. (c_parser_omp_unroll): Likewise. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust. * gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust. * gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust. * gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust. * gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test. * gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test. * gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to changed diagnostic messages. libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test. --- gcc/c/c-parser.cc | 10 +- gcc/cp/parser.cc | 12 +- gcc/cp/pt.cc | 12 +- gcc/fortran/openmp.cc | 173 -- gcc/fortran/trans-openmp.cc | 74 ++-- gcc/omp-transform-loops.cc| 1
[PATCH 00/40] OpenACC "kernels" Improvements
Hi, this patch series implements the re-work of the OpenACC "kernels" implementation that has been announced at the GNU Tools Track of this year's Linux Plumbers Conference; see https://linuxplumbersconf.org/event/11/contributions/998/. Versions of the patches have also been committed to the devel/omp/gcc-11 branch recently. The patch series contains middle-end changes that modify the "kernels" loop handling to use Graphite for dependence analysis of loops in "kernels" regions, as well as new optimizations and adjustments to existing optimizations to support this analysis. A central step is contained in the commit titled "openacc: Use Graphite for dependence analysis in \"kernels\" regions" whose commit message also contains further explanations. There are also front end changes (cf. the patches by Sandra Loosemore) that prepare the loops in "kernels" regions for the middle-end processing and which lift various restrictions on "kernels" regions. I have included some dependences (the patches by Julian Brown) from the devel/omp/gcc-11 branch which will be re-submitted independently for review. I have bootstrapped the compiler on x86_64-linux-gnu and performed comprehensive testing on a powerpc64le-linux-gnu target. The patches should apply cleanly on commit r12-4865 of the master branch. I am aware that we cannot incorporate those patches into GCC at the current development stage. I hope that we can discuss some of the changes before they can be considered for inclusion in GCC during the next stage 1. Best regards, Frederik Andrew Stubbs (2): openacc: Add data optimization pass openacc: Add runtime a lias checking for OpenACC kernels Frederik Harwath (20): Fortran: Delinearize array accesses openacc: Move pass_oacc_device_lower after pass_graphite graphite: Extend SCoP detection dump output graphite: Rename isl_id_for_ssa_name graphite: Fix minor mistakes in comments Move compute_alias_check_pairs to tree-data-ref.c graphite: Add runtime alias checking openacc: Use Graphite for dependence analysis in "kernels" regions openacc: Add "can_be_parallel" flag info to "graph" dumps openacc: Remove unused partitioning in "kernels" regions Add function for printing a single OMP_CLAUSE openacc: Warn about "independent" "kernels" loops with data-dependences openacc: Handle internal function calls in pass_lim openacc: Disable pass_pre on outlined functions analyzed by Graphite graphite: Tune parameters for OpenACC use graphite: Adjust scop loop-nest choice graphite: Accept loops without data references openacc: Enable reduction variable localization for "kernels" openacc: Check type for references in reduction lowering openacc: Adjust testsuite to new "kernels" handling Julian Brown (4): Reference reduction localization Fix tree check failure with reduction localization Use more appropriate var in localize_reductions call Handle references in OpenACC "private" clauses Sandra Loosemore (12): Kernels loops annotation: C and C++. Add -fno-openacc-kernels-annotate-loops option to more testcases. Kernels loops annotation: Fortran. Additional Fortran testsuite fixes for kernels loops annotation pass. Fix bug in processing of array dimensions in data clauses. Add a "combined" flag for "acc kernels loop" etc directives. Annotate inner loops in "acc kernels loop" directives (C/C++). Annotate inner loops in "acc kernels loop" directives (Fortran). Permit calls to builtins and intrinsics in kernels loops. Fix patterns in Fortran tests for kernels loop annotation. Clean up loop variable extraction in OpenACC kernels loop annotation. Relax some restrictions on the loop bound in kernels loop annotation. Tobias Burnus (2): Fix for is_gimple_reg vars to 'data kernels' openacc: fix privatization of by-reference arrays gcc/Makefile.in | 2 + gcc/c-family/c-common.h | 1 + gcc/c-family/c-omp.c | 915 +++-- gcc/c-family/c.opt| 8 + gcc/c/c-decl.c| 28 + gcc/c/c-parser.c | 3 + gcc/cfgloop.c | 1 + gcc/cfgloop.h | 6 + gcc/cfgloopmanip.c| 1 + gcc/common.opt| 9 + gcc/config/nvptx/nvptx.c | 7 + gcc/cp/decl.c | 44 + gcc/cp/parser.c | 3 + gcc/cp/semantics.c| 9 + gcc/doc/gimple.texi | 2 + gcc/doc/invoke.texi | 52 +- gcc/doc/passes.texi
[PATCH 03/40] Kernels loops annotation: Fortran.
From: Sandra Loosemore This patch implements the Fortran support for adding "#pragma acc loop auto" annotations to loops in OpenACC kernels regions. It implements the same -fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options that were previously added (and documented) for the C/C++ front ends. Co-Authored-By: Gergö Barany gcc/fortran/ * gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare. * lang.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. * openmp.c: Include options.h. (enum annotation_state, enum annotation_result): New. (check_code_for_invalid_calls): New. (check_expr_for_invalid_calls): New. (check_for_invalid_calls): New. (annotate_do_loop): New. (annotate_do_loops_in_kernels): New. (compute_goto_targets): New. (gfc_oacc_annotate_loops_in_kernels_regions): New. * parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops. gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add -fno-openacc-kernels-annotate-loops option. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data.f95: Likewise. * gfortran.dg/goacc/kernels-loop-n.f95: Likewise. * gfortran.dg/goacc/kernels-loop.f95: Likewise. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-1.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-9.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-10.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: New. --- gcc/fortran/gfortran.h| 1 + gcc/fortran/lang.opt | 8 + gcc/fortran/openmp.c | 364 ++ gcc/fortran/parse.c | 9 + .../goacc/classify-kernels-unparallelized.f95 | 1 + .../gfortran.dg/goacc/classify-kernels.f95| 1 + .../gfortran.dg/goacc/common-block-3.f90 | 1 + .../gfortran.dg/goacc/kernels-loop-2.f95 | 1 + .../goacc/kernels-loop-annotation-1.f95 | 33 ++ .../goacc/kernels-loop-annotation-10.f95 | 32 ++ .../goacc/kernels-loop-annotation-11.f95 | 34 ++ .../goacc/kernels-loop-annotation-12.f95 | 39 ++ .../goacc/kernels-loop-annotation-13.f95 | 38 ++ .../goacc/kernels-loop-annotation-14.f95 | 35 ++ .../goacc/kernels-loop-annotation-15.f95 | 35 ++ .../goacc/kernels-loop-annotation-16.f95 | 34 ++ .../goacc/kernels-loop-annotation-2.f95 | 32 ++ .../goacc/kernels-loop-annotation-3.f95 | 33 ++ .../goacc/kernels-loop-annotation-4.f95 | 34 ++ .../goacc/kernels-loop-annotation-5.f95 | 35 ++ .../goacc/kernels-loop-annotation-6.f95 | 34 ++ .../goacc/kernels-loop-annotation-7.f95 | 48 +++ .../goacc/kernels-loop-annotation-8.f95 | 50 +++ .../goacc/kernels-loop-annotation-9.f95 | 34 ++ .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit.f95| 1 + .../goacc/kernels-loop-data-update.f95| 1 + .../gfortran.dg/goacc/kernels-loop-data.f95 | 1 + .../gfortran.dg/goacc/kernels-loop-n.f95 | 1 + .../gfortran.dg/goacc/kernels-loop.f95| 1 + .../kernels-parallel-loop-data-enter-exit.f95 | 1 + 32 files changed, 974 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1
[PATCH 04/40] Additional Fortran testsuite fixes for kernels loops annotation pass.
From: Sandra Loosemore 2020-03-27 Sandra Loosemore gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust line numbering. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Add -fno-openacc-kernels-annotate-loops. --- .../gfortran.dg/goacc/classify-kernels-unparallelized.f95| 5 +++-- gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 | 5 +++-- gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 | 1 + 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 index 2ceae2088070..00aac9aa94ea 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 @@ -23,8 +23,9 @@ program main call setup(a, b) - !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC seq loop parallelism" } - do i = 0, n - 1 + !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) + do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" } + ! { dg-message "note: beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 24 } c(i) = a(f (i)) + b(f (i)) end do !$acc end kernels diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 index d061a241074b..ba815319abf2 100644 --- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 @@ -19,8 +19,9 @@ program main call setup(a, b) - !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) ! { dg-message "optimized: assigned OpenACC gang loop parallelism" } - do i = 0, n - 1 + !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1)) + do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop parallelism" } + ! { dg-message "beginning .parloops. part in OpenACC .kernels. region" "" { target *-*-* } 20 } c(i) = a(i) + b(i) end do !$acc end kernels diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 index 238482b91a49..04c998d11dad 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 @@ -1,5 +1,6 @@ ! Test OpenACC 'kernels' construct decomposition. +! { dg-additional-options "-fno-openacc-kernels-annotate-loops" } ! { dg-additional-options "-fopt-info-omp-all" } ! { dg-additional-options "--param=openacc-kernels=decompose" } ! { dg-additional-options "-O2" } for 'parloops'. -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH 06/40] Add a "combined" flag for "acc kernels loop" etc directives.
From: Sandra Loosemore 2020-08-19 Sandra Loosemore gcc/ * tree.h (OACC_LOOP_COMBINED): New. gcc/c/ * c-parser.c (c_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/cp/ * parser.c (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/fortran/ * trans-openmp.c (gfc_trans_omp_do): Add combined parameter, use it to set OACC_LOOP_COMBINED. Update all call sites. --- gcc/c/c-parser.c | 3 +++ gcc/cp/parser.c| 3 +++ gcc/fortran/trans-openmp.c | 34 +- gcc/tree.h | 5 + 4 files changed, 32 insertions(+), 13 deletions(-) diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 80dd61d599ef..1258b48693de 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -17371,6 +17371,7 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name, omp_clause_mask mask, tree *cclauses, bool *if_p) { bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1; + bool is_combined = (cclauses != NULL); strcat (p_name, " loop"); mask |= OACC_LOOP_CLAUSE_MASK; @@ -17389,6 +17390,8 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name, tree block = c_begin_compound_stmt (true); tree stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL, if_p); + if (stmt && stmt != error_mark_node) +OACC_LOOP_COMBINED (stmt) = is_combined; block = c_end_compound_stmt (loc, block, true); add_stmt (block); diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 4c2075742d6a..c834d25b028f 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -44580,6 +44580,7 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name, omp_clause_mask mask, tree *cclauses, bool *if_p) { bool is_parallel = ((mask >> PRAGMA_OACC_CLAUSE_REDUCTION) & 1) == 1; + bool is_combined = (cclauses != NULL); strcat (p_name, " loop"); mask |= OACC_LOOP_CLAUSE_MASK; @@ -44598,6 +44599,8 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name, tree block = begin_omp_structured_block (); int save = cp_parser_begin_omp_structured_block (parser); tree stmt = cp_parser_omp_for_loop (parser, OACC_LOOP, clauses, NULL, if_p); + if (stmt && stmt != error_mark_node) +OACC_LOOP_COMBINED (stmt) = is_combined; cp_parser_end_omp_structured_block (parser, save); add_stmt (finish_omp_structured_block (block)); diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index e81c5588c53c..618e106791e5 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -4855,7 +4855,8 @@ typedef struct dovar_init_d { static tree gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, - gfc_omp_clauses *do_clauses, tree par_clauses) + gfc_omp_clauses *do_clauses, tree par_clauses, + bool combined) { gfc_se se; tree dovar, stmt, from, to, step, type, init, cond, incr, orig_decls; @@ -5219,7 +5220,10 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t *pblock, case EXEC_OMP_DISTRIBUTE: stmt = make_node (OMP_DISTRIBUTE); break; case EXEC_OMP_LOOP: stmt = make_node (OMP_LOOP); break; case EXEC_OMP_TASKLOOP: stmt = make_node (OMP_TASKLOOP); break; -case EXEC_OACC_LOOP: stmt = make_node (OACC_LOOP); break; +case EXEC_OACC_LOOP: + stmt = make_node (OACC_LOOP); + OACC_LOOP_COMBINED (stmt) = combined; + break; default: gcc_unreachable (); } @@ -5313,7 +5317,8 @@ gfc_trans_oacc_combined_directive (gfc_code *code) pblock = █ else pushlevel (); - stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL); + stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, &loop_clauses, NULL, + true); protected_set_expr_location (stmt, loc); if (TREE_CODE (stmt) != BIND_EXPR) stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); @@ -6151,7 +6156,7 @@ gfc_trans_omp_do_simd (gfc_code *code, stmtblock_t *pblock, omp_do_clauses = gfc_trans_omp_clauses (&block, &clausesa[GFC_OMP_SPLIT_DO], code->loc); body = gfc_trans_omp_do (code, EXEC_OMP_SIMD, pblock ? pblock : &block, - &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses); + &clausesa[GFC_OMP_SPLIT_SIMD], omp_clauses, false); if (pblock == NULL) { if (TREE_CODE (body) != BIND_EXPR) @@ -6209,7 +6214,7 @@ gfc_trans_omp_parallel_do (gfc_code *code, bool is_loop, stmtblock_t *pblock, } stmt = gfc_trans_omp_do (code, is_loop ? EXEC_OMP_LOOP : EXEC_OMP_DO, new_pblock, &clausesa[GFC_OMP_SPLIT_DO], - omp_clauses); + omp_clauses, false); if (pblock == NULL) { if (TREE_CODE (stmt) != BIND_EXPR) @@ -6496,7 +6501,8 @@
[PATCH 08/40] Annotate inner loops in "acc kernels loop" directives (Fortran).
From: Sandra Loosemore Normally explicit loop directives in a kernels region inhibit automatic annotation of other loops in the same nest, on the theory that users have indicated they want manual control over that section of code. However there seems to be an expectation in user code that the combined "kernels loop" directive should still allow annotation of inner loops. This patch implements this behavior in Fortran. 2020-08-19 Sandra Loosemore gcc/fortran/ * openmp.c (annotate_do_loops_in_kernels): Handle EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner loops in a combined "acc kernels loop" directive. gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-18.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-19.f95: New. * gfortran.dg/goacc/combined-directives.f90: Adjust expected patterns. * gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise. * gfortran.dg/goacc/private-predetermined-kernels-1.f95: Likewise. --- gcc/fortran/openmp.c | 50 ++- .../gfortran.dg/goacc/combined-directives.f90 | 19 +-- .../goacc/kernels-loop-annotation-18.f95 | 28 +++ .../goacc/kernels-loop-annotation-19.f95 | 29 +++ .../goacc/private-explicit-kernels-1.f95 | 7 ++- .../goacc/private-predetermined-kernels-1.f95 | 7 ++- 6 files changed, 131 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-18.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-19.f95 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 243b5e0a9ac6..b0b68b494778 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -9272,7 +9272,6 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent, case EXEC_OACC_PARALLEL_LOOP: case EXEC_OACC_PARALLEL: - case EXEC_OACC_KERNELS_LOOP: case EXEC_OACC_LOOP: /* Do not try to add automatic OpenACC annotations inside manually annotated loops. Presumably, the user avoided doing it on @@ -9317,6 +9316,55 @@ annotate_do_loops_in_kernels (gfc_code *code, gfc_code *parent, } break; + case EXEC_OACC_KERNELS_LOOP: + /* This is a combined "acc kernels loop" directive. We want to +leave the outer loop alone but try to annotate any nested +loops in the body. The expected structure nesting here is + EXEC_OACC_KERNELS_LOOP +EXEC_OACC_KERNELS_LOOP + EXEC_DO +EXEC_DO + ...body... */ + if (code->block) + /* Might be empty? */ + { + gcc_assert (code->block->op == EXEC_OACC_KERNELS_LOOP); + gfc_omp_clauses *clauses = code->ext.omp_clauses; + int collapse = clauses->collapse; + gfc_expr_list *tile = clauses->tile_list; + gfc_code *inner = code->block->next; + + gcc_assert (inner->op == EXEC_DO); + gcc_assert (inner->block->op == EXEC_DO); + + /* We need to skip over nested loops covered by "collapse" or +"tile" clauses. "Tile" takes precedence +(see gfc_trans_omp_do). */ + if (tile) + { + collapse = 0; + for (gfc_expr_list *el = tile; el; el = el->next) + collapse++; + } + if (clauses->orderedc) + collapse = clauses->orderedc; + if (collapse <= 0) + collapse = 1; + for (int i = 1; i < collapse; i++) + { + gcc_assert (inner->op == EXEC_DO); + gcc_assert (inner->block->op == EXEC_DO); + inner = inner->block->next; + } + if (inner) + /* Loop might have empty body? */ + annotate_do_loops_in_kernels (inner->block->next, + inner, goto_targets, + as_in_kernels_region); + } + walk_block = false; + break; + case EXEC_DO_WHILE: case EXEC_DO_CONCURRENT: /* Traverse the body in a special state to allow EXIT statements diff --git a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 index 956349204f4d..562a4e40cd7d 100644 --- a/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 @@ -139,10 +139,21 @@ end subroutine test ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } } ! { dg-final { scan-tree-dump-times "acc loop private.i. gang" 2 "gimple" } } -! { dg-final { scan-tree-dump-times "acc loo
[PATCH 09/40] Permit calls to builtins and intrinsics in kernels loops.
From: Sandra Loosemore This tweak to the OpenACC kernels loop annotation relaxes the restrictions on function calls in the loop body. Normally calls to functions not explicitly marked with a parallelism attribute are not permitted, but C/C++ builtins and Fortran intrinsics have known semantics so we can generally permit those without restriction. If any turn out to be problematical, we can add on here to recognize them, or in the processing of the "auto" annotations. 2020-08-22 Sandra Loosemore gcc/c-family/ * c-omp.c (annotate_loops_in_kernels_regions): Test for calls to builtins. gcc/fortran/ * openmp.c (check_expr_for_invalid_calls): Check for intrinsic functions. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-20.c: New. * gfortran.dg/goacc/kernels-loop-annotation-20.f95: New. --- gcc/c-family/c-omp.c | 10 --- gcc/fortran/openmp.c | 9 --- .../goacc/kernels-loop-annotation-20.c| 23 .../goacc/kernels-loop-annotation-20.f95 | 26 +++ 4 files changed, 61 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 diff --git a/gcc/c-family/c-omp.c b/gcc/c-family/c-omp.c index 30757877eafe..e7c27f45e888 100644 --- a/gcc/c-family/c-omp.c +++ b/gcc/c-family/c-omp.c @@ -3545,8 +3545,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, break; case CALL_EXPR: - /* Direct function calls to functions marked as OpenACC routines are -allowed. Reject indirect calls or calls to non-routines. */ + /* Direct function calls to builtins and functions marked as +OpenACC routines are allowed. Reject indirect calls or calls +to non-routines. */ if (info->state >= as_in_kernels_loop) { tree fn = CALL_EXPR_FN (node), fn_decl = NULL_TREE; @@ -3560,8 +3561,9 @@ annotate_loops_in_kernels_regions (tree *nodeptr, int *walk_subtrees, } if (fn_decl == NULL_TREE) do_not_annotate_loop_nest (info, as_invalid_call, node); - else if (!lookup_attribute ("oacc function", - DECL_ATTRIBUTES (fn_decl))) + else if (!fndecl_built_in_p (fn_decl, BUILT_IN_NORMAL) + && !lookup_attribute ("oacc function", +DECL_ATTRIBUTES (fn_decl))) do_not_annotate_loop_nest (info, as_invalid_call, node); } break; diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index b0b68b494778..d5d996e378d7 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -9156,9 +9156,12 @@ check_expr_for_invalid_calls (gfc_expr **exprp, int *walk_subtrees, switch (expr->expr_type) { case EXPR_FUNCTION: - if (expr->value.function.esym - && (expr->value.function.esym->attr.oacc_routine_lop - != OACC_ROUTINE_LOP_NONE)) + /* Permit calls to Fortran intrinsic functions and to routines +with an explicitly declared parallelism level. */ + if (expr->value.function.isym + || (expr->value.function.esym + && (expr->value.function.esym->attr.oacc_routine_lop + != OACC_ROUTINE_LOP_NONE))) return 0; /* Else fall through. */ diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c new file mode 100644 index ..5e3f02845713 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-annotation-20.c @@ -0,0 +1,23 @@ +/* { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-Wopenacc-kernels-annotate-loops" } */ +/* { dg-additional-options "-fdump-tree-original" } */ +/* { dg-do compile } */ + +/* Test that calls to built-in functions don't inhibit kernels loop + annotation. */ + +void foo (int n, int *input, int *out1, int *out2) +{ +#pragma acc kernels + { +int i; + +for (i = 0; i < n; i++) + { + out1[i] = __builtin_clz (input[i]); + out2[i] = __builtin_popcount (input[i]); + } + } +} + +/* { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } */ diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 new file mode 100644 index ..5169a0a1676d --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-20.f95 @@ -0,0 +1,26 @@ +! { dg-additional-options "-fopenacc -fopenacc-kernels-annotate-loops" } +! { dg-additional-options "-Wopenacc-kernels-annotate-loops" } +! { dg-additional-options "-fdump-tree-original" } +! { dg-do compile } + +! Test that a loop with calls to intri
[PATCH 10/40] Fix patterns in Fortran tests for kernels loop annotation.
From: Sandra Loosemore Several of the Fortran tests for kernels loop annotation were failing due to changes in the formatting of "acc loop" constructs in the dump file. Now the "auto" clause appears first, instead of after "private". 2020-08-23 Sandra Loosemore gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update expected output. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise. --- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-15.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-16.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-2.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-3.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-4.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-5.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-6.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-7.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-8.f95 | 2 +- 14 files changed, 14 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 index 41f6307dbb17..42e751dbfb83 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 @@ -30,4 +30,4 @@ subroutine f (a, b, c) !$acc end kernels end subroutine f -! { dg-final { scan-tree-dump-times "acc loop private\\(.\\) auto" 3 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 3 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 index d51482e4685d..6e2e2c41172b 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-11.f95 @@ -31,4 +31,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 index 3c4956d70775..03c4234ce7cd 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-12.f95 @@ -36,4 +36,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 index 3ec459f0a8df..6aeb3f2fe4d0 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-13.f95 @@ -35,4 +35,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 1 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 1 "original" } } diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 index 91f431cca432..7d1cff64a3d9 100644 --- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 +++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-14.f95 @@ -32,4 +32,4 @@ function f (a, b) end function f -! { dg-final { scan-tree-dump-times "acc loop private.* auto" 0 "original" } } +! { dg-final { scan-tree-dump-times "acc loop auto" 0 "original" } } diff --git a
[PATCH 13/40] Fortran: Delinearize array accesses
The Fortran front end presently linearizes accesses to multi-dimensional arrays by combining the indices for the various dimensions into a series of explicit multiplies and adds with refactoring to allow CSE of invariant parts of the computation. Unfortunately this representation interferes with Graphite-based loop optimizations. It is difficult to recover the original multi-dimensional form of the access by the time loop optimizations run because parts of it have already been optimized away or into a form that is not easily recognizable, so it seems better to have the Fortran front end produce delinearized accesses to begin with, a set of nested ARRAY_REFs similar to the existing behavior of the C and C++ front ends. This is a long-standing problem that has previously been discussed e.g. in PR 14741 and PR61000. This patch is an initial implementation for explicit array accesses only; it doesn't handle the accesses generated during scalarization of whole-array or array-section operations, which follow a different code path. Co-Authored-By: Tobias Burnus gcc/ChangeLog: * expr.c (get_inner_reference): Handle NOP_EXPR. gcc/fortran/ChangeLog: * lang.opt: Document -param=delinearize. * trans-array.c: (get_class_array_vptr): New function. (get_array_lbound): New function. (get_array_ubound): New function. (gfc_conv_array_ref): Implement main delinearization logic. (build_array_ref): Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/assumed_type_2.f90: Adjust test expectations. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/gomp/affinity-clause-1.f90: Likewise. * gfortran.dg/graphite/block-2.f: Likewise. * gfortran.dg/graphite/block-3.f90: Likewise. * gfortran.dg/graphite/block-4.f90: Likewise. * gfortran.dg/graphite/id-9.f: Likewise. * gfortran.dg/inline_matmul_16.f90: Likewise. * gfortran.dg/inline_matmul_24.f90: Likewise. * gfortran.dg/no_arg_check_2.f90: Likewise. * gfortran.dg/pr32921.f: Likewise. * gfortran.dg/reassoc_4.f: Likewise. * gfortran.dg/vect/fast-math-mgrid-resid.f: Likewise. --- gcc/expr.c| 1 + gcc/fortran/lang.opt | 4 + gcc/fortran/trans-array.c | 321 +- gcc/testsuite/gfortran.dg/assumed_type_2.f90 | 6 +- .../gfortran.dg/goacc/kernels-loop-inner.f95 | 2 +- .../gfortran.dg/gomp/affinity-clause-1.f90| 2 +- gcc/testsuite/gfortran.dg/graphite/block-2.f | 9 +- .../gfortran.dg/graphite/block-3.f90 | 2 +- .../gfortran.dg/graphite/block-4.f90 | 2 +- gcc/testsuite/gfortran.dg/graphite/id-9.f | 2 +- .../gfortran.dg/inline_matmul_16.f90 | 2 + .../gfortran.dg/inline_matmul_24.f90 | 2 +- gcc/testsuite/gfortran.dg/no_arg_check_2.f90 | 6 +- gcc/testsuite/gfortran.dg/pr32921.f | 2 +- gcc/testsuite/gfortran.dg/reassoc_4.f | 2 +- .../gfortran.dg/vect/fast-math-mgrid-resid.f | 1 + 16 files changed, 270 insertions(+), 96 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index eb33643bd770..188905b4fe4d 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7759,6 +7759,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize, break; case VIEW_CONVERT_EXPR: + case NOP_EXPR: break; case MEM_REF: diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index a202c04c4a25..25c5a5a32c41 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -521,6 +521,10 @@ fdefault-real-16 Fortran Var(flag_default_real_16) Set the default real kind to an 16 byte wide type. +-param=delinearize= +Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization +Delinearize array references. + fdollar-ok Fortran Var(flag_dollar_ok) Allow dollar signs in entity names. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index 5ceb261b6989..e84b4cb55f05 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t) } } - static tree -build_array_ref (tree desc, tree offset, tree decl, tree vptr) +get_class_array_vptr (tree desc, tree vptr) { - tree tmp; tree type; tree cdesc; @@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr) && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type))) vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0)); } + return vptr; +} +static tree +build_array_ref (tree desc, tree offset, tree decl, tree vptr) +{ + tree tmp; + vptr = get_class_array_vptr (desc, vptr); tmp = gfc_conv_array_data (desc); tmp = build_fold_indirect_ref_loc (input_location, tmp); tmp = gfc_build_array_ref (tmp, offset, decl, vptr); return tmp; } +/* Get the declared lower bound for rank
Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive
Hi Thomas, On 01.04.23 10:42, Thomas Schwinge wrote: ... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not offloading), '-O0' (only): [...] FAIL: libgomp.fortran/loop-transforms/unroll-1.f90 -O0 execution test [...] FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90 -O0 execution test Thank you for reporting the failures! They are caused by mistakes in the test code, not the implementation. I have attached a patch which fixes the failures. I have been able to reproduce the failures with -m32. With the patch they went away, even with 100 of repeated test executions ;-). Best regards, Frederik From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Thu, 6 Apr 2023 14:52:07 +0200 Subject: [PATCH] openmp: Fix loop transformation tests libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause. * testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var. * testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction and initialization. --- libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 | 2 +- libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++ .../libgomp.fortran/loop-transforms/unroll-simd-1.f90 | 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 index 6aedbf4724f..a7cb5e7635d 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 @@ -69,7 +69,7 @@ module test_functions integer :: i,j sum = 0 -!$omp parallel do collapse(2) +!$omp parallel do collapse(2) reduction(+:sum) !$omp tile sizes(6,10) do i = 1,10,3 do j = 1,10,3 diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 index f07aab898fa..b91ea275577 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 @@ -8,6 +8,7 @@ module test_functions integer :: i,j +sum = 0 !$omp do do i = 1,10,3 !$omp unroll full @@ -22,6 +23,7 @@ module test_functions integer :: i,j +sum = 0 !$omp parallel do reduction(+:sum) !$omp unroll partial(2) do i = 1,10,3 diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 index 5fb64ddd6fd..7a43458f0dd 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 @@ -9,7 +9,8 @@ module test_functions integer :: i,j -!$omp simd +sum = 0 +!$omp simd reduction(+:sum) do i = 1,10,3 !$omp unroll full do j = 1,10,3 -- 2.36.1
Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
Hi Jakub, On 15.05.23 12:19, Jakub Jelinek wrote: On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote: this patch series implements the OpenMP 5.1 "unroll" and "tile" constructs. It includes changes to the C,C++, and Fortran front end for parsing the new constructs and a new middle-end "omp_transform_loops" pass which implements the transformations in a source language agnostic way. I'm afraid we can't do it this way, at least not completely. The OpenMP requirements and what is being discussed for further loop transformations pretty much requires parts of it to be done as soon as possible. My understanding is that that is where other implementations implement that too and would also prefer GCC not to be the only implementation that takes significantly different decision in that case from other implementations The place where different compilers implement the loop transformations was discussed in an OpenMP loop transformation meeting last year. Two compilers (another one and GCC with this patch series) transformed the loops in the middle end after the handling of data sharing, one planned to do so. Yet another vendor had not yet decided where it will be implemented. Clang currently does everything in the front end, but it was mentioned that this might change in the future e.g. for code sharing with Flang. Implementing the loop transformations late could potentially complicate the implementation of transformations which require adjustments of the data sharing clauses, but this is known and consequentially, no such transformations are planned for OpenMP 6.0. In particular, the "apply" clause therefore only permits loop-transforming constructs to be applied to the loops generated from other loop transformations in TR11. The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP) already need to know given their collapse/ordered how many loops they are actually associated with and the loop transformation constructs can change that. So, I think we need to do the loop transformations in the FEs, that doesn't mean we need to write everything 3 times, once for each frontend. Already now, e.g. various stuff is shared between C and C++ FEs in c-family, though how much can be shared between c-family and Fortran is to be discovered. Or at least partially, to the extent that we compute how many canonical loops the loop transformations result in, what artificial iterators they will use etc., so that during gimplification we can take all that into account and then can do the actual transformations later. The patches in this patch series already do compute how many canonical loop nests result from the loop transformations in the front end. This is necessary to represent the loop nest that is affected by the loop transformations by a single OMP_FOR to meet the expectations of all later OpenMP code transformations. This is also the major reason why the loop transformations are represented by clauses instead of representing them as "OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR" as you suggest below. Since the loop transformations may also appear on inner loops of a collapsed loop nest (i.e. within the collapsed depth), representing the transformation by OMP_FOR-like constructs would imply that a collapsed loop nest would have to be broken apart into single loops. Perhaps this could be handled somehow, but the collapsed loop nest would have to be re-assembled to meet the expectations of e.g. gimplification. The clause representation is also much better suited for the upcoming OpenMP "apply" clause where the transformations will not appear as directives in front of actual loops but inside of other clauses. In fact, the loop transformation clauses in the implementation already specify the level of a loop nest to which they apply and it could be possible to re-use this handling for "apply". My initial reaction also was to implement the loop transformations as OMP_FOR-like constructs and the patch actually introduces an OMP_LOOP_TRANS construct which is used to represent loops that are not going to be associated with another OpenMP directive after the transformation, e.g. void foo () { #pragma omp tile sizes (4, 8, 16) for (int i = 0; i < 64; ++i) { ... } } You suggest to implement the loop transformations during gimplification. I am not sure if gimplification is actually well-suited to implement the depth-first evaluation of the loop transformations. I also believe that gimplification already handles too many things which conceptually are not related to the translation to GIMPLE. Having a separate pass seems to be the right move to achieve a better separation of concerns. I think this will be even more important in the future as the size of the loop transformation implementation keeps growing. As you mention below, several new constructs are already planned.
Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives
Hi Jakub, On 16.05.23 13:00, Jakub Jelinek wrote: On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote: The place where different compilers implement the loop transformations was discussed in an OpenMP loop transformation meeting last year. Two compilers (another one and GCC with this patch series) transformed the loops in the middle end after the handling of data sharing, one planned to do so. Yet another vendor had not yet decided where it will be implemented. Clang currently does everything in the front end, but it was mentioned that this might change in the future e.g. for code sharing with Flang. Implementing the loop transformations late could potentially complicate the implementation of transformations which require adjustments of the data sharing clauses, but this is known and consequentially, no such When already in the FE we determine how many canonical loops a particular loop transformation creates, I think the primary changes I'd like to see is really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and consider where is the best spot to lower it. I believe for data sharing it is best done during gimplification before the containing loops are handled, it is already shared code among all the FEs, I think will make it easier to handle data sharing right and gimplification is also where doacross processing is done. While there is restriction that ordered clause is incompatible with generated loops from tile construct, there isn't one for unroll (unless "The ordered clause must not appear on a worksharing-loop directive if the associated loops include the generated loops of a tile directive." means unroll partial implicitly because partial unroll tiles the loop, but it doesn't say it acts as if it was a tile construct), so we'd have to handle #pragma omp for ordered(2) for (int i = 0; i < 64; i++) #pragma omp unroll partial(4) for (int j = 0; j < 64; j++) { #pragma omp ordered depend (sink: i - 1, j - 2) #pragma omp ordered depend (source) } and I think handling it after gimplification is going to be increasingly harder. Of course another possibility is ask lang committee to clarify unless it has been clarified already in 6.0 (but in TR11 it is not). I do not really expect that we will have to handle this. Questions concerning the correctness of code after applying loop transformations came up several times since I have been following the design meetings and the result was always either that nothing will be changed, because the loop transformations are not expected to ensure the correctness of enclosing directives, or that the use of the problematic construct in conjunction with loop transformations will be forbidden. Concerning the use of "ordered" on transformed loops, the latter approach was suggested for all transformations, cf. issue #3494 in the private OpenMP spec repository. I see that you have already asked for clarification on unroll. I suppose this could also be fixed after gimplification with reasonable effort. But let's just wait for the result of that discussion before we continue worrying about this. Also, I think creating temporaries is easier to be done during gimplification than later. This has not caused problems with the current approach. Another option is as you implemented a separate pre-omp-lowering pass, and another one would be do it in the omplower pass, which has actually several subpasses internally, do it in the scan phase. Disadvantage of a completely separate pass is that we have to walk the whole IL again, while doing it in the scan phase means we avoid that cost. We already do there similar transformations, scan_omp_simd transforms simd constructs into if (...) simd else simt and then we process it with normal scan_omp_for on what we've created. So, if you insist doing it after gimplification perhaps for compatibility with other non-LLVM compilers, I'd prefer to do it there rather than in a completely separate pass. I see. This would be possible. My current approach is indeed rather wasteful because the pass is not restricted to functions that actually use loop transformations. I could add an attribute to such functions that could be used to avoid the execution of the pass and hence the gimple walk on functions that do not use transformations. This is necessary to represent the loop nest that is affected by the loop transformations by a single OMP_FOR to meet the expectations of all later OpenMP code transformations. This is also the major reason why the loop transformations are represented by clauses instead of representing them as "OMP_UNROLL/OMP_TILE as GENERIC constructs like OMP_FOR" as you suggest below. Since the I really don't see why. We try to represent what we see in the source as OpenMP constructs as those constructs. We already have a precedent with composite loop constructs, where for the combined constr