[PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses
From: frederik OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause"; this was first clarified by OpenACC 2.6) requires that, if a variable is used in reduction clauses on two nested loops, then there must be reduction clauses for that variable on all loops that are nested in between the two loops and all these reduction clauses must use the same operator. This commit introduces a check for that property which reports warnings if it is violated. 2019-11-06 Gergö Barany Frederik Harwath Thomas Schwinge gcc/ * omp-low.c (struct omp_context): New fields local_reduction_clauses, outer_reduction_clauses. (new_omp_context): Initialize these. (scan_sharing_clauses): Record reduction clauses on OpenACC constructs. (scan_omp_for): Check reduction clauses for incorrect nesting. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-warn.c: New test. * c-c++-common/goacc/nested-reductions.c: New test. * gfortran.dg/goacc/nested-reductions-warn.f90: New test. * gfortran.dg/goacc/nested-reductions.f90: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: Add expected warnings about missing reduction clauses. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Likewise. Reviewed-by: Thomas Schwinge git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277875 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 10 + gcc/omp-low.c | 97 +++ gcc/testsuite/ChangeLog | 9 + .../goacc/nested-reductions-warn.c| 525 ++ .../c-c++-common/goacc/nested-reductions.c| 420 +++ .../goacc/nested-reductions-warn.f90 | 674 ++ .../gfortran.dg/goacc/nested-reductions.f90 | 540 ++ libgomp/ChangeLog | 11 + .../par-loop-comb-reduction-1.c | 2 +- .../par-loop-comb-reduction-2.c | 2 +- .../par-loop-comb-reduction-3.c | 2 +- .../par-loop-comb-reduction-4.c | 2 +- 12 files changed, 2290 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-warn.c create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/nested-reductions.f90 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 7fee0f37e9bf..38160dd631e9 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,13 @@ +2019-11-06 Gergö Barany + Frederik Harwath + Thomas Schwinge + + * omp-low.c (struct omp_context): New fields + local_reduction_clauses, outer_reduction_clauses. + (new_omp_context): Initialize these. + (scan_sharing_clauses): Record reduction clauses on OpenACC constructs. + (scan_omp_for): Check reduction clauses for incorrect nesting. + 2019-11-06 Jakub Jelinek PR inline-asm/92352 diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 122f42788813..fa76ceba33c6 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -128,6 +128,12 @@ struct omp_context corresponding tracking loop iteration variables. */ hash_map *lastprivate_conditional_map; + /* A tree_list of the reduction clauses in this context. */ + tree local_reduction_clauses; + + /* A tree_list of the reduction clauses in outer contexts. */ + tree outer_reduction_clauses; + /* Nesting depth of this context. Used to beautify error messages re invalid gotos. The outermost ctx is depth 1, with depth 0 being reserved for the main body of the function. */ @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->outer = outer_ctx; ctx->cb = outer_ctx->cb; ctx->cb.block = NULL; + ctx->local_reduction_clauses = NULL; + ctx->outer_reduction_clauses = ctx->outer_reduction_clauses; ctx->depth = outer_ctx->depth + 1; } else @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb.transform_call_graph_edges = CB_CGE_MOVE; ctx->cb.adjust_array_error_bounds = true; ctx->cb.dont_remap_vla_if_no_change = true; + ctx->local_reduction_clauses = NULL; + ctx->outer_reduction_clauses = NULL; ctx->depth = 1; } @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) goto do_pri
[PATCH][COMMITTED] Add OpenACC 2.6 `serial' construct support
Hi, the following patch has been reviewed and committed. Frederik --- 8< -- The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard) is equivalent to a `parallel' construct with clauses `num_gangs(1) num_workers(1) vector_length(1)' implied. These clauses are therefore not supported with the `serial' construct. All the remaining clauses accepted with `parallel' are also accepted with `serial'. The `serial' construct is implemented like `parallel', except for hardcoding dimensions rather than taking them from the relevant clauses, in `expand_omp_target'. Separate codes are used to denote the `serial' construct throughout the middle end, even though the mapping of `serial' to an equivalent `parallel' construct could have been done in the individual language frontends. In particular, this allows to distinguish between compute constructs in warnings, error messages, dumps etc. 2019-11-12 Maciej W. Rozycki Tobias Burnus Frederik Harwath Thomas Schwinge gcc/ * gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL enumeration constant. (is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (is_gimple_omp_offloaded): Likewise. * gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration constant. Adjust the value of ORT_NONE accordingly. (is_gimple_stmt): Handle OACC_SERIAL. (oacc_default_clause): Handle ORT_ACC_SERIAL. (gomp_needs_data_present): Likewise. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_workshare): Handle OACC_SERIAL. (gimplify_expr): Likewise. * omp-expand.c (expand_omp_target): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (build_omp_regions_1, omp_make_gimple_edges): Likewise. * omp-low.c (is_oacc_parallel): Rename function to... (is_oacc_parallel_or_serial): ... this. Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (scan_sharing_clauses): Adjust accordingly. (scan_omp_for): Likewise. (lower_oacc_head_mark): Likewise. (convert_from_firstprivate_int): Likewise. (lower_omp_target): Likewise. (check_omp_nesting_restrictions): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (lower_oacc_reductions): Likewise. (lower_omp_target): Likewise. * tree.def (OACC_SERIAL): New tree code. * tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL. * doc/generic.texi (OpenACC): Document OACC_SERIAL. gcc/c-family/ * c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration constant. * c-pragma.c (oacc_pragmas): Add "serial" entry. gcc/c/ * c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro. (c_parser_oacc_kernels_parallel): Rename function to... (c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL. (c_parser_omp_construct): Update accordingly. gcc/cp/ * constexpr.c (potential_constant_expression_1): Handle OACC_SERIAL. * parser.c (OACC_SERIAL_CLAUSE_MASK): New macro. (cp_parser_oacc_kernels_parallel): Rename function to... (cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL. (cp_parser_omp_construct): Update accordingly. (cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic order. * pt.c (tsubst_expr): Handle OACC_SERIAL. gcc/fortran/ * gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL enumeration constants. (gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL enumeration constants. * match.h (gfc_match_oacc_serial): New prototype. (gfc_match_oacc_serial_loop): Likewise. * dump-parse-tree.c (show_omp_node, show_code_node): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL. * match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP. * openmp.c (OACC_SERIAL_CLAUSES): New macro. (gfc_match_oacc_serial_loop): New function. (gfc_match_oacc_serial): Likewise. (oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP. (resolve_omp_clauses): Handle EXEC_OACC_SERIAL. (oacc_code_to_statement): Handle EXEC_OACC_SERIAL and EXEC_OACC_SERIAL_LOOP. (gfc_resolve_oacc_directive): Likewise. * parse.c (decode_oacc_directive) <'s'>: Add case for "serial" and "serial loop". (next_statement): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL. (gfc_ascii_statement): Likewise. Handle ST_OACC_END_SERIAL_LOOP and ST_OACC_END_SERIAL. (parse_oacc_structured_block): Handle ST_OACC_SERIAL. (parse_oacc_loop): Handle ST_OACC_SERIAL_LOOP and ST_OACC_END_SERIAL_LOOP. (parse_executable): Handle ST_OACC_SERIAL_LOOP and ST_OACC_SERIAL. (is_oacc): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL. * resolve.c (gfc
[PATCH 1/4] openmp: Fix loop transformation tests
libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause. * testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var. * testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction and initialization. --- libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 | 2 +- libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++ .../libgomp.fortran/loop-transforms/unroll-simd-1.f90 | 3 ++- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 index 6aedbf4724f..a7cb5e7635d 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 @@ -69,7 +69,7 @@ module test_functions integer :: i,j sum = 0 -!$omp parallel do collapse(2) +!$omp parallel do collapse(2) reduction(+:sum) !$omp tile sizes(6,10) do i = 1,10,3 do j = 1,10,3 diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 index f07aab898fa..b91ea275577 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 @@ -8,6 +8,7 @@ module test_functions integer :: i,j +sum = 0 !$omp do do i = 1,10,3 !$omp unroll full @@ -22,6 +23,7 @@ module test_functions integer :: i,j +sum = 0 !$omp parallel do reduction(+:sum) !$omp unroll partial(2) do i = 1,10,3 diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 index 5fb64ddd6fd..7a43458f0dd 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 @@ -9,7 +9,8 @@ module test_functions integer :: i,j -!$omp simd +sum = 0 +!$omp simd reduction(+:sum) do i = 1,10,3 !$omp unroll full do j = 1,10,3 -- 2.36.1 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH 0/4] openmp: loop transformation fixes
Hi, the following patches contain some fixes from the devel/omp/gcc-13 branch to the patches that implement the OpenMP 5.1. loop transformation directives which I have posted in March 2023. Frederik Frederik Harwath (4): openmp: Fix loop transformation tests openmp: Fix initialization for 'unroll full' openmp: Fix diagnostic message for "omp unroll" openmp: Fix number of iterations computation for "omp unroll full" gcc/omp-transform-loops.cc| 99 ++- .../gomp/loop-transforms/unroll-8.c | 76 ++ .../gomp/loop-transforms/unroll-8.f90 | 2 +- .../gomp/loop-transforms/unroll-9.f90 | 2 +- .../matrix-no-directive-unroll-full-1.C | 13 +++ .../loop-transforms/matrix-no-directive-1.c | 2 +- .../matrix-no-directive-unroll-full-1.c | 2 +- .../matrix-omp-distribute-parallel-for-1.c| 2 + .../loop-transforms/matrix-omp-for-1.c| 2 +- .../matrix-omp-parallel-for-1.c | 2 +- .../matrix-omp-parallel-masked-taskloop-1.c | 2 + ...trix-omp-parallel-masked-taskloop-simd-1.c | 2 + .../matrix-omp-target-parallel-for-1.c| 2 +- ...p-target-teams-distribute-parallel-for-1.c | 2 + .../loop-transforms/matrix-omp-taskloop-1.c | 2 + ...trix-omp-teams-distribute-parallel-for-1.c | 2 + .../loop-transforms/matrix-simd-1.c | 2 + .../loop-transforms/unroll-1.c| 8 +- .../loop-transforms/unroll-non-rect-1.c | 2 + .../loop-transforms/tile-2.f90| 2 +- .../loop-transforms/unroll-1.f90 | 2 + .../loop-transforms/unroll-6.f90 | 4 +- .../loop-transforms/unroll-simd-1.f90 | 3 +- 23 files changed, 197 insertions(+), 40 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C -- 2.36.1 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH 4/4] openmp: Fix number of iterations computation for "omp unroll full"
gcc/ChangeLog: * omp-transform-loops.cc (gomp_for_number_of_iterations): Always compute "final - init" and do not take absolute value. Identify non-iterating and infinite loops for constant init, final, step values for better diagnostic messages, consistent behaviour in those corner cases, and better testability. (gomp_for_constant_iterations_p): Add new argument to pass on information about infinite loops, and ... (full_unroll): ... use from here to emit a warning and remove unrolled, known infinite loops consistently. (process_omp_for): Only print dump message if loop has not been removed by transformation. gcc/testsuite/ChangeLog: * c-c++-common/gomp/loop-transforms/unroll-8.c: New test. --- gcc/omp-transform-loops.cc| 94 ++- .../gomp/loop-transforms/unroll-8.c | 76 +++ 2 files changed, 146 insertions(+), 24 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/loop-transforms/unroll-8.c diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc index c8853bcee89..b0645397641 100644 --- a/gcc/omp-transform-loops.cc +++ b/gcc/omp-transform-loops.cc @@ -153,20 +153,27 @@ subst_defs (tree expr, gimple_seq seq) return expr; } -/* Return an expression for the number of iterations of the outermost loop of - OMP_FOR. */ +/* Return an expression for the number of iterations of the loop at + the given LEVEL of OMP_FOR. + + If the expression is a negative constant, this means that the loop + is infinite. This can only be recognized for loops with constant + initial, final, and step values. In general, according to the + OpenMP specification, the behaviour is unspecified if the number of + iterations does not fit the types used for their computation, and + hence in particular if the loop is infinite. */ tree gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level) { gcc_assert (!non_rectangular_p (omp_for)); - tree init = gimple_omp_for_initial (omp_for, level); tree final = gimple_omp_for_final (omp_for, level); tree_code cond = gimple_omp_for_cond (omp_for, level); tree index = gimple_omp_for_index (omp_for, level); tree type = gomp_for_iter_count_type (index, final); - tree step = TREE_OPERAND (gimple_omp_for_incr (omp_for, level), 1); + tree incr = gimple_omp_for_incr (omp_for, level); + tree step = omp_get_for_step_from_incr (gimple_location (omp_for), incr); init = subst_defs (init, gimple_omp_for_pre_body (omp_for)); init = fold (init); @@ -181,34 +188,64 @@ gomp_for_number_of_iterations (const gomp_for *omp_for, size_t level) diff_type = ptrdiff_type_node; } - tree diff; - if (cond == GT_EXPR) -diff = fold_build2 (minus_code, diff_type, init, final); - else if (cond == LT_EXPR) -diff = fold_build2 (minus_code, diff_type, final, init); - else -gcc_unreachable (); - diff = fold_build2 (CEIL_DIV_EXPR, type, diff, step); - diff = fold_build1 (ABS_EXPR, type, diff); + /* Identify a simple case in which the loop does not iterate. The + computation below could not tell this apart from an infinite + loop, hence we handle this separately for better diagnostic + messages. */ + gcc_assert (cond == GT_EXPR || cond == LT_EXPR); + if (TREE_CONSTANT (init) && TREE_CONSTANT (final) + && ((cond == GT_EXPR && tree_int_cst_le (init, final)) + || (cond == LT_EXPR && tree_int_cst_le (final, init +return build_int_cst (diff_type, 0); + + tree diff = fold_build2 (minus_code, diff_type, final, init); + + /* Divide diff by the step. + + We could always use CEIL_DIV_EXPR since only non-negative results + correspond to valid number of iterations and the behaviour is + unspecified by the spec otherwise. But we try to get the rounding + right for constant negative values to identify infinite loops + more precisely for better warnings. */ + tree_code div_expr = CEIL_DIV_EXPR; + if (TREE_CONSTANT (diff) && TREE_CONSTANT (step)) +{ + bool diff_is_neg = tree_int_cst_lt (diff, size_zero_node); + bool step_is_neg = tree_int_cst_lt (step, size_zero_node); + if ((diff_is_neg && !step_is_neg) + || (!diff_is_neg && step_is_neg)) + div_expr = FLOOR_DIV_EXPR; +} + diff = fold_build2 (div_expr, type, diff, step); return diff; } -/* Return true if the expression representing the number of iterations for - OMP_FOR is a constant expression, false otherwise. */ +/* Return true if the expression representing the number of iterations + for OMP_FOR is a non-negative constant and set ITERATIONS to the + value of that expression. Otherwise, return false. Set INFINITE to + true if the number of iterations was recognized to be infinite. */ bool gomp_for_constant_iterations_p (gomp_for *omp_for, - unsigned HOST_WIDE_INT *iterations) +
[PATCH 3/4] openmp: Fix diagnostic message for "omp unroll"
gcc/ChangeLog: * omp-transform-loops.cc (print_optimized_unroll_partial_msg): Output "omp unroll partial" instead of "omp unroll auto". (optimize_transformation_clauses): Likewise. libgomp/ChangeLog: * testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust. * gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust. --- gcc/omp-transform-loops.cc| 4 ++-- gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 | 2 +- gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 | 2 +- .../testsuite/libgomp.fortran/loop-transforms/unroll-6.f90| 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc index 275a5260dae..c8853bcee89 100644 --- a/gcc/omp-transform-loops.cc +++ b/gcc/omp-transform-loops.cc @@ -1423,7 +1423,7 @@ print_optimized_unroll_partial_msg (tree c) tree unroll_factor = OMP_CLAUSE_UNROLL_PARTIAL_EXPR (c); dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, dump_loc, "replaced consecutive % directives by " - "%\n", tree_to_uhwi (unroll_factor)); } @@ -1483,7 +1483,7 @@ optimize_transformation_clauses (tree clauses) dump_printf_loc ( MSG_OPTIMIZED_LOCATIONS, dump_loc, - "removed useless % directives " + "removed useless % directives " "preceding 'omp unroll full'\n"); } } diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 index fd687890ee6..dab3f0fb5cf 100644 --- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-8.f90 @@ -5,7 +5,7 @@ subroutine test1 implicit none integer :: i !$omp parallel do collapse(1) - !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(24\)'} } + !$omp unroll partial(4) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll partial\(24\)'} } !$omp unroll partial(3) !$omp unroll partial(2) !$omp unroll partial(1) diff --git a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 index 928ca44e811..91e13ff1b37 100644 --- a/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/loop-transforms/unroll-9.f90 @@ -4,7 +4,7 @@ subroutine test1 implicit none integer :: i - !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} } + !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' directives preceding 'omp unroll full'} } !$omp unroll partial(3) !$omp unroll partial(2) !$omp unroll partial(1) diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 index 1df8ce8d5bb..b953ce31b5b 100644 --- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 +++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-6.f90 @@ -22,7 +22,7 @@ contains sum = 0 !$omp parallel do reduction(+:sum) lastprivate(i) -!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll auto\(50\)'} } +!$omp unroll partial(5) ! { dg-optimized {replaced consecutive 'omp unroll' directives by 'omp unroll partial\(50\)'} } !$omp unroll partial(10) do i = 1,n,step sum = sum + 1 @@ -36,7 +36,7 @@ contains sum = 0 !$omp parallel do reduction(+:sum) lastprivate(i) do i = 1,n,step - !$omp unroll full ! { dg-optimized {removed useless 'omp unroll auto' directives preceding 'omp unroll full'} } + !$omp unroll full ! { dg-optimized {removed useless 'omp unroll partial' directives preceding 'omp unroll full'} } !$omp unroll partial(10) do j = 1, 1000 sum = sum + 1 -- 2.36.1 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH 2/4] openmp: Fix initialization for 'unroll full'
The index variable initialization for the 'omp unroll' directive with 'full' clause got lost and the testsuite did not catch it. Add the initialization and add -Wall to some tests to detect uninitialized variable uses and other potential problems in the code generation. gcc/ChangeLog: * omp-transform-loops.cc (full_unroll): Add initialization of index variable. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c: Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty pragmas. Use -O2. * testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C: Copy of testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c, but using -O0 which works only for C++. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use -Wall and use -Wno-unknown-pragmas to disable warnings about empty pragmas. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c: Likewise. * testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: Likewise and fix broken function calls found by -Wall. --- gcc/omp-transform-loops.cc | 1 + .../matrix-no-directive-unroll-full-1.C | 13 + .../loop-transforms/matrix-no-directive-1.c | 2 +- .../matrix-no-directive-unroll-full-1.c | 2 +- .../matrix-omp-distribute-parallel-for-1.c | 2 ++ .../loop-transforms/matrix-omp-for-1.c | 2 +- .../loop-transforms/matrix-omp-parallel-for-1.c | 2 +- .../matrix-omp-parallel-masked-taskloop-1.c | 2 ++ .../matrix-omp-parallel-masked-taskloop-simd-1.c| 2 ++ .../matrix-omp-target-parallel-for-1.c | 2 +- ...rix-omp-target-teams-distribute-parallel-for-1.c | 2 ++ .../loop-transforms/matrix-omp-taskloop-1.c | 2 ++ .../matrix-omp-teams-distribute-parallel-for-1.c| 2 ++ .../loop-transforms/matrix-simd-1.c | 2 ++ .../libgomp.c-c++-common/loop-transforms/unroll-1.c | 8 +--- .../loop-transforms/unroll-non-rect-1.c | 2 ++ 16 files changed, 40 insertions(+), 8 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C diff --git a/gcc/omp-transform-loops.cc b/gcc/omp-transform-loops.cc index 517faea537c..275a5260dae 100644 --- a/gcc/omp-transform-loops.cc +++ b/gcc/omp-transform-loops.cc @@ -548,6 +548,7 @@ full_unroll (gomp_for *omp_for, location_t loc, walk_ctx *ctx ATTRIBUTE_UNUSED) gimple_seq unrolled = NULL; gimple_seq_add_seq (&unrolled, gimple_omp_for_pre_body (omp_for)); + gimplify_assign (index, init, &unrolled); push_gimplify_context (); gimple_seq_add_seq (&unrolled, build_unroll_body (body, unroll_factor, index, incr)); diff --git a/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C new file mode 100644 index 000..3a684219627 --- /dev/null +++ b/libgomp/testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C @@ -0,0 +1,13 @@ +/* { dg-additional-options { -O0 -fdump-tree-original -Wall -Wno-unknown-pragmas } } */ + +#define COMMON_DIRECTIVE +#define COMMON_TOP_TRANSFORM omp unroll full +#define COLLAPSE_1 +#define COLLAPSE_2 +#define COLLAPSE_3 +#define IMPLEMENTATION_FILE "../../libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h" + +#include "../../libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h" + +/* A consistency check to prevent broken macro usage. */ +/* { dg-final { scan-tree-dump-times "unroll_full" 13 "original" } } */ diff --git a/libgomp/testsuite/libgomp.c-c++-common/loop-tra
[OG11][committed][PATCH 00/22] OpenACC "kernels" Improvements
Hi, this patch series implements the re-work of the OpenACC "kernels" implementation that has been announced at the GNU Tools Track of this year's Linux Plumbers Conference; see https://linuxplumbersconf.org/event/11/contributions/998/. The central step is contained in the commit titled "openacc: Use Graphite for dependence analysis in \"kernels\" regions" whose commit message also contains further explanations. Best regards, Frederik PS: The commit series also includes a backport from master "00b98b6cac25 Add dg-final option-based target selectors" and two trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg declaration" and "35cdc94463fe Fix branch prediction dump message" Andrew Stubbs (2): openacc: Add data optimization pass openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath (19): openacc: Move pass_oacc_device_lower after pass_graphite graphite: Extend SCoP detection dump output graphite: Rename isl_id_for_ssa_name graphite: Fix minor mistakes in comments Fix branch prediction dump message Move compute_alias_check_pairs to tree-data-ref.c graphite: Add runtime alias checking openacc: Use Graphite for dependence analysis in "kernels" regions openacc: Add "can_be_parallel" flag info to "graph" dumps openacc: Add further kernels tests openacc: Remove unused partitioning in "kernels" regions Add function for printing a single OMP_CLAUSE openacc: Warn about "independent" "kernels" loops with data-dependences openacc: Handle internal function calls in pass_lim openacc: Disable pass_pre on outlined functions analyzed by Graphite graphite: Tune parameters for OpenACC use graphite: Adjust scop loop-nest choice graphite: Accept loops without data references openacc: Adjust test expectations to new "kernels" handling Sandra Loosemore (1): Fortran: delinearize multi-dimensional array accesses gcc/Makefile.in |2 + gcc/cfgloop.c |1 + gcc/cfgloop.h |6 + gcc/cfgloopmanip.c|1 + gcc/common.opt|9 + gcc/config/nvptx/nvptx.c |7 + gcc/doc/gimple.texi |2 + gcc/doc/invoke.texi | 20 +- gcc/doc/passes.texi |6 +- gcc/expr.c|1 + gcc/flag-types.h |1 + gcc/fortran/lang.opt |4 + gcc/fortran/trans-array.c | 321 -- gcc/gimple-loop-interchange.cc|2 +- gcc/gimple-pretty-print.c |3 + gcc/gimple-walk.c | 15 +- gcc/gimple-walk.h |6 + gcc/gimple.h |7 +- gcc/gimplify.c| 13 +- gcc/graph.c | 35 +- gcc/graphite-dependences.c| 220 +++- gcc/graphite-isl-ast-to-gimple.c | 271 - gcc/graphite-oacc.c | 689 gcc/graphite-oacc.h | 55 + gcc/graphite-optimize-isl.c | 42 +- gcc/graphite-poly.c | 41 +- gcc/graphite-scop-detection.c | 654 +-- gcc/graphite-sese-to-poly.c | 90 +- gcc/graphite.c| 120 +- gcc/graphite.h| 40 +- gcc/internal-fn.c |2 + gcc/internal-fn.h |4 +- gcc/omp-data-optimize.cc | 951 gcc/omp-expand.c | 110 +- gcc/omp-general.c | 23 +- gcc/omp-general.h |1 + gcc/omp-low.c | 321 +- gcc/omp-oacc-kernels-decompose.cc | 145 ++- gcc/omp-offload.c | 1001 + gcc/omp-offload.h |2 + gcc/params.opt|5 +- gcc/passes.c | 42 + gcc/passes.def| 47 +- gcc/predict.c |2 +- gcc/sese.c| 25 +- gcc/sese.h| 19 + gcc/testsuite/c-c++-common/goacc/acc-icf.c|4 +- gcc/testsuite/c-c++-common/goacc/cache-3-1.c |2 +- ...classify-kernels-unparallelized-graphite.c | 41 + ...lassify-kernels-unparallelized-parloops.c} | 12 +- .../c-c++-common/goacc/classify-kernels.c | 27 +- .
[OG11][committed][PATCH 01/22] Fortran: delinearize multi-dimensional array accesses
From: Sandra Loosemore The Fortran front end presently linearizes accesses to multi-dimensional arrays by combining the indices for the various dimensions into a series of explicit multiplies and adds with refactoring to allow CSE of invariant parts of the computation. Unfortunately this representation interferes with Graphite-based loop optimizations. It is difficult to recover the original multi-dimensional form of the access by the time loop optimizations run because parts of it have already been optimized away or into a form that is not easily recognizable, so it seems better to have the Fortran front end produce delinearized accesses to begin with, a set of nested ARRAY_REFs similar to the existing behavior of the C and C++ front ends. This is a long-standing problem that has previously been discussed e.g. in PR 14741 and PR61000. This patch is an initial implementation for explicit array accesses only; it doesn't handle the accesses generated during scalarization of whole-array or array-section operations, which follow a different code path. gcc/ * expr.c (get_inner_reference): Handle NOP_EXPR like VIEW_CONVERT_EXPR. gcc/fortran/ * lang.opt (-param=delinearize=): New. * trans-array.c (get_class_array_vptr): New, split from... (build_array_ref): ...here. (get_array_lbound, get_array_ubound): New, split from... (gfc_conv_array_ref): ...here. Additional code refactoring plus support for delinearization of the array access. gcc/testsuite/ * gfortran.dg/assumed_type_2.f90: Adjust patterns. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/graphite/block-3.f90: Remove xfails. * gfortran.dg/graphite/block-4.f90: Likewise. * gfortran.dg/inline_matmul_24.f90: Adjust patterns. * gfortran.dg/no_arg_check_2.f90: Likewise. * gfortran.dg/pr32921.f: Likewise. * gfortran.dg/reassoc_4.f: Disable delinearization for this test. Co-Authored-By: Tobias Burnus --- gcc/expr.c| 1 + gcc/fortran/lang.opt | 4 + gcc/fortran/trans-array.c | 321 +- gcc/testsuite/gfortran.dg/assumed_type_2.f90 | 6 +- .../gfortran.dg/goacc/kernels-loop-inner.f95 | 2 +- gcc/testsuite/gfortran.dg/graphite/block-2.f | 9 +- .../gfortran.dg/graphite/block-3.f90 | 1 - .../gfortran.dg/graphite/block-4.f90 | 1 - gcc/testsuite/gfortran.dg/graphite/id-9.f | 2 +- .../gfortran.dg/inline_matmul_24.f90 | 2 +- gcc/testsuite/gfortran.dg/no_arg_check_2.f90 | 6 +- gcc/testsuite/gfortran.dg/pr32921.f | 2 +- gcc/testsuite/gfortran.dg/reassoc_4.f | 2 +- 13 files changed, 264 insertions(+), 95 deletions(-) diff --git a/gcc/expr.c b/gcc/expr.c index 21b7e96ed62e..c7ee800c4d4f 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7539,6 +7539,7 @@ get_inner_reference (tree exp, poly_int64_pod *pbitsize, break; case VIEW_CONVERT_EXPR: + case NOP_EXPR: break; case MEM_REF: diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt index dba333448c11..1548d56278a4 100644 --- a/gcc/fortran/lang.opt +++ b/gcc/fortran/lang.opt @@ -521,6 +521,10 @@ fdefault-real-16 Fortran Var(flag_default_real_16) Set the default real kind to an 16 byte wide type. +-param=delinearize= +Common Joined UInteger Var(flag_delinearize_aref) Init(1) IntegerRange(0,1) Param Optimization +Delinearize array references. + fdollar-ok Fortran Var(flag_dollar_ok) Allow dollar signs in entity names. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index b7d949929722..3eb9a1778173 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -3747,11 +3747,9 @@ add_to_offset (tree *cst_offset, tree *offset, tree t) } } - static tree -build_array_ref (tree desc, tree offset, tree decl, tree vptr) +get_class_array_vptr (tree desc, tree vptr) { - tree tmp; tree type; tree cdesc; @@ -3775,19 +3773,74 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr) && GFC_CLASS_TYPE_P (TYPE_CANONICAL (type))) vptr = gfc_class_vptr_get (TREE_OPERAND (cdesc, 0)); } + return vptr; +} +static tree +build_array_ref (tree desc, tree offset, tree decl, tree vptr) +{ + tree tmp; + vptr = get_class_array_vptr (desc, vptr); tmp = gfc_conv_array_data (desc); tmp = build_fold_indirect_ref_loc (input_location, tmp); tmp = gfc_build_array_ref (tmp, offset, decl, vptr); return tmp; } +/* Get the declared lower bound for rank N of array DECL which might + be either a bare array or a descriptor. This differs from + gfc_conv_array_lbound because it gets information for temporary array + objects from AR instead of the descriptor (they can differ). */ + +static tree +get_array_lbound (tree decl, int n, gfc_symbol *sym, +
[OG11][committed][PATCH 03/22] graphite: Extend SCoP detection dump output
Extend dump output to make understanding why Graphite rejects to include a loop in a SCoP easier (for GCC developers). ChangeLog: * graphite-scop-detection.c (scop_detection::can_represent_loop): Output reason for failure to dump file. (scop_detection::harmful_loop_in_region): Likewise. (scop_detection::graphite_can_represent_expr): Likewise. (scop_detection::stmt_has_simple_data_refs_p): Likewise. (scop_detection::stmt_simple_for_scop_p): Likewise. (print_sese_loop_numbers): New function. (scop_detection::add_scop): Use from here to print loops in rejected SCoP. --- gcc/graphite-scop-detection.c | 188 +- 1 file changed, 165 insertions(+), 23 deletions(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 3e729b159b09..46c470210d05 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -69,12 +69,27 @@ public: fprintf (output.dump_file, "%d", i); return output; } + friend debug_printer & operator<< (debug_printer &output, const char *s) { fprintf (output.dump_file, "%s", s); return output; } + + friend debug_printer & + operator<< (debug_printer &output, gimple* stmt) + { +print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS); +return output; + } + + friend debug_printer & + operator<< (debug_printer &output, tree t) + { +print_generic_expr (output.dump_file, t, TDF_SLIM); +return output; + } } dp; #define DEBUG_PRINT(args) do \ @@ -506,6 +521,24 @@ scop_detection::merge_sese (sese_l first, sese_l second) const return combined; } +/* Print the loop numbers of the loops contained + in SESE to FILE. */ + +static void +print_sese_loop_numbers (FILE *file, sese_l sese) +{ + loop_p loop; + bool printed = false; + FOR_EACH_LOOP (loop, 0) + { +if (loop_in_sese_p (loop, sese)) + fprintf (file, "%d, ", loop->num); +printed = true; + } + if (printed) +fprintf (file, "\b\b"); +} + /* Build scop outer->inner if possible. */ void @@ -519,8 +552,13 @@ scop_detection::build_scop_depth (loop_p loop) if (! next || harmful_loop_in_region (next)) { - if (s) - add_scop (s); + if (next) +DEBUG_PRINT ( +dp << "[scop-detection] Discarding SCoP on loops "; +print_sese_loop_numbers (dump_file, next); +dp << " because of harmful loops\n";); + if (s) +add_scop (s); build_scop_depth (loop); s = invalid_sese; } @@ -560,14 +598,62 @@ scop_detection::can_represent_loop (loop_p loop, sese_l scop) || !single_pred_p (loop->latch) || exit->src != single_pred (loop->latch) || !empty_block_p (loop->latch)) -return false; +{ + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop shape unsupported.\n"); + return false; +} + + bool edge_irreducible + = loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP; + if (edge_irreducible) +{ + DEBUG_PRINT ( + dp << "[can_represent_loop-fail] Loop is not a natural loop.\n"); + return false; +} + + bool niter_is_unconditional = number_of_iterations_exit (loop, + single_exit (loop), + &niter_desc, false); - return !(loop_preheader_edge (loop)->flags & EDGE_IRREDUCIBLE_LOOP) -&& number_of_iterations_exit (loop, single_exit (loop), &niter_desc, false) -&& niter_desc.control.no_overflow -&& (niter = number_of_latch_executions (loop)) -&& !chrec_contains_undetermined (niter) -&& graphite_can_represent_expr (scop, loop, niter); + if (!niter_is_unconditional) +{ + DEBUG_PRINT ( + dp << "[can_represent_loop-fail] Loop niter not unconditional.\n" + << "Condition: " << niter_desc.assumptions << "\n"); + return false; +} + + niter = number_of_latch_executions (loop); + if (!niter) +{ + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter unknown.\n"); + return false; +} + if (!niter_desc.control.no_overflow) +{ + DEBUG_PRINT (dp << "[can_represent_loop-fail] Loop niter can overflow.\n"); + return false; +} + + bool undetermined_coefficients = chrec_contains_undetermined (niter); + if (undetermined_coefficients) +{ + DEBUG_PRINT (dp << "[can_represent_loop-fail] " + << "Loop niter chrec contains undetermined coefficients.\n"); + return false; +} + + bool can_represent_expr = graphite_can_represent_expr (scop, loop, niter); + if (!can_represent_expr) +{ + DEBUG_PRINT (dp << "[can_represent_loop-fail] " + << "Loop niter expression cannot be represented: " + << niter << "\n"); + return false; +} + + ret
[OG11][committed][PATCH 02/22] openacc: Move pass_oacc_device_lower after pass_graphite
The OpenACC device lowering pass must run after the Graphite pass to allow for the use of Graphite for automatic parallelization of kernels regions in the future. Experimentation has shown that it is best, performancewise, to run pass_oacc_device_lower together with the related passes pass_oacc_loop_designation and pass_oacc_gimple_workers early after pass_graphite in pass_tree_loop, at least if the other tree loop passes are not adjusted. In particular, to enable vectorization which is crucial for GCN offloading, device lowering should happen before pass_vectorize. To bring the loops contained in the offloading functions into the shape expected by the loop vectorizer, we have to make sure that some passes that previously were executed only once before pass_tree_loop are also executed on the offloading functions. To ensure the execution of pass_oacc_device_lower if pass_tree_loop does not execute (no loops, no optimizations), we introduce two further copies of the pass to the pipeline that run if there are no loops or if no optimization is performed. gcc/ChangeLog: * omp-general.c (oacc_get_fn_dim_size): Return 0 on missing "dims". * omp-offload.c (pass_oacc_loop_designation::clone): New member function. (pass_oacc_gimple_workers::clone): Likewise. (pass_oacc_gimple_device_lower::clone): Likewise. * passes.c (pass_data_no_loop_optimizations): New pass_data. (class pass_no_loop_optimizations): New pass. (make_pass_no_loop_optimizations): New function. * passes.def: Move pass_oacc_{loop_designation, gimple_workers, device_lower} into tree_loop, and add copies to pass_tree_no_loop and to new pass_no_loop_optimizations. Add copies of passes pass_ccp, pass_ipa_warn, pass_complete_unrolli, pass_backprop, pass_phiprop, pass_fix_loops after the OpenACC passes in pass_tree_loop. * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New member function. (pass_complete_unrolli::clone): Likewise. * tree-ssa-loop.c (pass_fix_loops::clone): Likewise. (pass_tree_loop_init::clone): Likewise. (pass_tree_loop_done::clone): Likewise. * tree-ssa-phiprop.c (pass_phiprop::clone): Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/goacc/loop-processing-1.c: Adjust expected output * to pass name changes due to the pass reordering and cloning. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/routine-nohost-1.c: Likewise. * c-c++-common/unroll-1.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gcc.dg/tree-ssa/backprop-1.c: Likewise. * gcc.dg/tree-ssa/backprop-2.c: Likewise. * gcc.dg/tree-ssa/backprop-3.c: Likewise. * gcc.dg/tree-ssa/backprop-4.c: Likewise. * gcc.dg/tree-ssa/backprop-5.c: Likewise. * gcc.dg/tree-ssa/backprop-6.c: Likewise. * gcc.dg/tree-ssa/cunroll-1.c: Likewise. * gcc.dg/tree-ssa/cunroll-3.c: Likewise. * gcc.dg/tree-ssa/cunroll-9.c: Likewise. * gcc.dg/tree-ssa/ldist-17.c: Likewise. * gcc.dg/tree-ssa/loop-38.c: Likewise. * gcc.dg/tree-ssa/pr21463.c: Likewise. * gcc.dg/tree-ssa/pr45427.c: Likewise. * gcc.dg/tree-ssa/pr61743-1.c: Likewise. * gcc.dg/unroll-2.c: Likewise. * gcc.dg/unroll-3.c: Likewise. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/vect/vect-profile-1.c: Likewise. * c-c++-common/goacc/device-lowering-debug-optimization.c: New test. * c-c++-common/goacc/device-lowering-no-loops.c: New test. * c-c++-common/goacc/device-lowering-no-optimization.c: New test. Co-Authored-By: Thomas Schwinge --- gcc/omp-general.c | 8 +- gcc/omp-offload.c | 8 ++ gcc/passes.c | 42 gcc/passes.def
[OG11][committed][PATCH 04/22] graphite: Rename isl_id_for_ssa_name
The SSA names for which this function gets used are always SCoP parameters and hence "isl_id_for_parameter" is a better name. It also explains the prefix "P_" for those names in the ISL representation. gcc/ChangeLog: * graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ... (isl_id_for_parameter): ... this new function name. (build_scop_context): Adjust function use. --- gcc/graphite-sese-to-poly.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index eebf2e02cfca..195851cb540a 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -100,14 +100,15 @@ extract_affine_mul (scop_p s, tree e, __isl_take isl_space *space) return isl_pw_aff_mul (lhs, rhs); } -/* Return an isl identifier from the name of the ssa_name E. */ +/* Return an isl identifier for the parameter P. */ static isl_id * -isl_id_for_ssa_name (scop_p s, tree e) +isl_id_for_parameter (scop_p s, tree p) { - char name1[14]; - snprintf (name1, sizeof (name1), "P_%d", SSA_NAME_VERSION (e)); - return isl_id_alloc (s->isl_context, name1, e); + gcc_checking_assert (TREE_CODE (p) == SSA_NAME); + char name[14]; + snprintf (name, sizeof (name), "P_%d", SSA_NAME_VERSION (p)); + return isl_id_alloc (s->isl_context, name, p); } /* Return an isl identifier for the data reference DR. Data references and @@ -893,15 +894,15 @@ build_scop_context (scop_p scop) isl_space *space = isl_space_set_alloc (scop->isl_context, nbp, 0); unsigned i; - tree e; - FOR_EACH_VEC_ELT (region->params, i, e) + tree p; + FOR_EACH_VEC_ELT (region->params, i, p) space = isl_space_set_dim_id (space, isl_dim_param, i, - isl_id_for_ssa_name (scop, e)); + isl_id_for_parameter (scop, p)); scop->param_context = isl_set_universe (space); - FOR_EACH_VEC_ELT (region->params, i, e) -add_param_constraints (scop, i, e); + FOR_EACH_VEC_ELT (region->params, i, p) +add_param_constraints (scop, i, p); } /* Return true when loop A is nested in loop B. */ -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 05/22] graphite: Fix minor mistakes in comments
gcc/ChangeLog: * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and a reference to a variable which does not exist. * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo in comment. --- gcc/graphite-isl-ast-to-gimple.c | 2 +- gcc/graphite-sese-to-poly.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index c202213f39b3..44c06016f1a2 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1018,7 +1018,7 @@ gsi_insert_earliest (gimple_seq seq) basic_block begin_bb = get_entry_bb (codegen_region); /* Inserting the gimple statements in a vector because gimple_seq behave - in strage ways when inserting the stmts from it into different basic + in strange ways when inserting the stmts from it into different basic blocks one at a time. */ auto_vec stmts; for (gimple_stmt_iterator gsi = gsi_start (seq); !gsi_end_p (gsi); diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c index 195851cb540a..12fa2d669b3c 100644 --- a/gcc/graphite-sese-to-poly.c +++ b/gcc/graphite-sese-to-poly.c @@ -644,14 +644,14 @@ build_poly_sr_1 (poly_bb_p pbb, gimple *stmt, tree var, enum poly_dr_type kind, isl_map *acc, isl_set *subscript_sizes) { scop_p scop = PBB_SCOP (pbb); - /* Each scalar variables has a unique alias set number starting from + /* Each scalar variable has a unique alias set number starting from the maximum alias set assigned to a dr. */ int alias_set = scop->max_alias_set + SSA_NAME_VERSION (var); subscript_sizes = isl_set_fix_si (subscript_sizes, isl_dim_set, 0, alias_set); /* Add a constrain to the ACCESSES polyhedron for the alias set of - data reference DR. */ + the reference */ isl_constraint *c = isl_equality_alloc (isl_local_space_from_space (isl_map_get_space (acc))); c = isl_constraint_set_constant_si (c, -alias_set); -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 07/22] Move compute_alias_check_pairs to tree-data-ref.c
Move this function from tree-loop-distribution.c to tree-data-ref.c and make it non-static to enable its use from other parts of GCC. gcc/ChangeLog: * tree-loop-distribution.c (data_ref_segment_size): Remove function. (latch_dominated_by_data_ref): Likewise. (compute_alias_check_pairs): Likewise. * tree-data-ref.c (data_ref_segment_size): New function, copied from tree-loop-distribution.c (compute_alias_check_pairs): Likewise. (latch_dominated_by_data_ref): Likewise. * tree-data-ref.h (compute_alias_check_pairs): New declaration. --- gcc/tree-data-ref.c | 87 gcc/tree-data-ref.h | 3 ++ gcc/tree-loop-distribution.c | 87 3 files changed, 90 insertions(+), 87 deletions(-) diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index d04e95f7c285..71f8d790e618 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -2645,6 +2645,93 @@ create_intersect_range_checks (class loop *loop, tree *cond_expr, dump_printf (MSG_NOTE, "using an address-based overlap test\n"); } +/* Compute and return an expression whose value is the segment length which + will be accessed by DR in NITERS iterations. */ + +static tree +data_ref_segment_size (struct data_reference *dr, tree niters) +{ + niters = size_binop (MINUS_EXPR, + fold_convert (sizetype, niters), + size_one_node); + return size_binop (MULT_EXPR, +fold_convert (sizetype, DR_STEP (dr)), +fold_convert (sizetype, niters)); +} + +/* Return true if LOOP's latch is dominated by statement for data reference + DR. */ + +static inline bool +latch_dominated_by_data_ref (class loop *loop, data_reference *dr) +{ + return dominated_by_p (CDI_DOMINATORS, single_exit (loop)->src, +gimple_bb (DR_STMT (dr))); +} + +/* Compute alias check pairs and store them in COMP_ALIAS_PAIRS for LOOP's + data dependence relations ALIAS_DDRS. */ + +void +compute_alias_check_pairs (class loop *loop, vec *alias_ddrs, + vec *comp_alias_pairs) +{ + unsigned int i; + unsigned HOST_WIDE_INT factor = 1; + tree niters_plus_one, niters = number_of_latch_executions (loop); + + gcc_assert (niters != NULL_TREE && niters != chrec_dont_know); + niters = fold_convert (sizetype, niters); + niters_plus_one = size_binop (PLUS_EXPR, niters, size_one_node); + + if (dump_file && (dump_flags & TDF_DETAILS)) +fprintf (dump_file, "Creating alias check pairs:\n"); + + /* Iterate all data dependence relations and compute alias check pairs. */ + for (i = 0; i < alias_ddrs->length (); i++) +{ + ddr_p ddr = (*alias_ddrs)[i]; + struct data_reference *dr_a = DDR_A (ddr); + struct data_reference *dr_b = DDR_B (ddr); + tree seg_length_a, seg_length_b; + + if (latch_dominated_by_data_ref (loop, dr_a)) + seg_length_a = data_ref_segment_size (dr_a, niters_plus_one); + else + seg_length_a = data_ref_segment_size (dr_a, niters); + + if (latch_dominated_by_data_ref (loop, dr_b)) + seg_length_b = data_ref_segment_size (dr_b, niters_plus_one); + else + seg_length_b = data_ref_segment_size (dr_b, niters); + + unsigned HOST_WIDE_INT access_size_a + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a; + unsigned HOST_WIDE_INT access_size_b + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_b; + unsigned int align_a = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_a))); + unsigned int align_b = TYPE_ALIGN_UNIT (TREE_TYPE (DR_REF (dr_b))); + + dr_with_seg_len_pair_t dr_with_seg_len_pair + (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a), +dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b), +/* ??? Would WELL_ORDERED be safe? */ +dr_with_seg_len_pair_t::REORDERED); + + comp_alias_pairs->safe_push (dr_with_seg_len_pair); +} + + if (tree_fits_uhwi_p (niters)) +factor = tree_to_uhwi (niters); + + /* Prune alias check pairs. */ + prune_runtime_alias_test_list (comp_alias_pairs, factor); + if (dump_file && (dump_flags & TDF_DETAILS)) +fprintf (dump_file, +"Improved number of alias checks from %d to %d\n", +alias_ddrs->length (), comp_alias_pairs->length ()); +} + /* Create a conditional expression that represents the run-time checks for overlapping of address ranges represented by a list of data references pairs passed in ALIAS_PAIRS. Data references are in LOOP. The returned diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h index 8001cc54f518..5016ec926b1d 100644 --- a/gcc/tree-data-ref.h +++ b/gcc/tree-data-ref.h @@ -577,6 +577,9 @@ extern opt_result runtime_alias_check_p (ddr_p, class loop *, bool); extern int data_ref_compare_tree (tree, tree); extern void prune_runtime_alias_test_list (vec
[OG11][committed][PATCH 08/22] graphite: Add runtime alias checking
Graphite rejects a SCoP if it contains a pair of data references for which it cannot determine statically if they may alias. This happens very often, for instance in C code which does not use explicit "restrict". This commit adds the possibility to analyze a SCoP nevertheless and perform an alias check at runtime. Then, if aliasing is detected, the execution will fall back to the unoptimized SCoP. TODO This needs more testing on non-OpenACC code. gcc/ChangeLog: * common.opt: Add fgraphite-runtime-alias-checks. * graphite-isl-ast-to-gimple.c (generate_alias_cond): New function. (graphite_regenerate_ast_isl): Use from here. * graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ... (free_scop): and release here. * graphite-scop-detection.c (dr_defs_outside_region): New function. (dr_well_analyzed_for_runtime_alias_check_p): New function. (graphite_runtime_alias_check_p): New function. (build_alias_set): Record unhandled alias ddrs for later alias check creation if flag_graphite_runtime_alias_checks is true instead of failing. * graphite.h (struct scop): Add field unhandled_alias_ddrs. * sese.h (has_operands_from_region_p): New function. gcc/testsuite/ChangeLog: * gcc.dg/graphite/alias-1.c: New test. --- gcc/common.opt | 4 + gcc/graphite-isl-ast-to-gimple.c| 60 ++ gcc/graphite-poly.c | 2 + gcc/graphite-scop-detection.c | 239 +--- gcc/graphite.h | 4 + gcc/sese.h | 18 ++ gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 +++ 7 files changed, 326 insertions(+), 23 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c diff --git a/gcc/common.opt b/gcc/common.opt index 771398bc03de..aa695e56dc48 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1636,6 +1636,10 @@ fgraphite-identity Common Var(flag_graphite_identity) Optimization Enable Graphite Identity transformation. +fgraphite-runtime-alias-checks +Common Var(flag_graphite_runtime_alias_checks) Optimization Init(1) +Allow Graphite to add runtime alias checks to loop-nests if aliasing cannot be resolved statically. + fhoist-adjacent-loads Common Var(flag_hoist_adjacent_loads) Optimization Enable hoisting adjacent loads to encourage generating conditional move diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index 44c06016f1a2..caa0160b9bce 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1456,6 +1456,34 @@ generate_entry_out_of_ssa_copies (edge false_entry, } } +/* Create a condition that evaluates to TRUE if all ALIAS_DDRS are free of + aliasing. */ + +static tree +generate_alias_cond (vec &alias_ddrs, loop_p context_loop) +{ + gcc_checking_assert (flag_graphite_runtime_alias_checks + && alias_ddrs.length () > 0); + gcc_checking_assert (context_loop); + + auto_vec check_pairs; + compute_alias_check_pairs (context_loop, &alias_ddrs, &check_pairs); + gcc_checking_assert (check_pairs.length () > 0); + + tree alias_cond = NULL_TREE; + create_runtime_alias_checks (context_loop, &check_pairs, &alias_cond); + gcc_checking_assert (alias_cond); + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "Generated runtime alias check: "); + print_generic_expr (dump_file, alias_cond, dump_flags); + fprintf (dump_file, "\n"); +} + + return alias_cond; +} + /* GIMPLE Loop Generator: generates loops in GIMPLE form for the given SCOP. Return true if code generation succeeded. */ @@ -1496,12 +1524,44 @@ graphite_regenerate_ast_isl (scop_p scop) region->if_region = if_region; loop_p context_loop = region->region.entry->src->loop_father; + gcc_checking_assert (context_loop); edge e = single_succ_edge (if_region->true_region->region.entry->dest); basic_block bb = split_edge (e); /* Update the true_region exit edge. */ region->if_region->true_region->region.exit = single_succ_edge (bb); + if (flag_graphite_runtime_alias_checks + && scop->unhandled_alias_ddrs.length () > 0) +{ + /* SCoP detection has failed to handle the aliasing between some data +references of the SCoP statically. Generate an alias check that selects +the newly generated version of the SCoP in the true-branch of the +conditional if aliasing can be ruled out at runtime and the original +version of the SCoP, otherwise. */ + + loop_p loop + = find_common_loop (scop->scop_info->region.entry->dest->loop_father, + scop->scop_info->region.exit->src->loop_father); + tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop); + tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); + set_ifsese_condition (re
[OG11][committed][PATCH 10/22] openacc: Add "can_be_parallel" flag info to "graph" dumps
gcc/ChangeLog: * graph.c (oacc_get_fn_attrib): New declaration. (find_loop_location): New declaration. (draw_cfg_nodes_for_loop): Print value of the can_be_parallel flag at the top of loops in OpenACC functions. --- gcc/graph.c | 35 --- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/gcc/graph.c b/gcc/graph.c index ce8de33ffe10..3ad07be3b309 100644 --- a/gcc/graph.c +++ b/gcc/graph.c @@ -191,6 +191,10 @@ draw_cfg_nodes_no_loops (pretty_printer *pp, struct function *fun) } } + +extern tree oacc_get_fn_attrib (tree); +extern dump_user_location_t find_loop_location (class loop *); + /* Draw all the basic blocks in LOOP. Print the blocks in breath-first order to get a good ranking of the nodes. This function is recursive: It first prints inner loops, then the body of LOOP itself. */ @@ -205,17 +209,26 @@ draw_cfg_nodes_for_loop (pretty_printer *pp, int funcdef_no, if (loop->header != NULL && loop->latch != EXIT_BLOCK_PTR_FOR_FN (cfun)) -pp_printf (pp, - "\tsubgraph cluster_%d_%d {\n" - "\tstyle=\"filled\";\n" - "\tcolor=\"darkgreen\";\n" - "\tfillcolor=\"%s\";\n" - "\tlabel=\"loop %d\";\n" - "\tlabeljust=l;\n" - "\tpenwidth=2;\n", - funcdef_no, loop->num, - fillcolors[(loop_depth (loop) - 1) % 3], - loop->num); +{ + pp_printf (pp, + "\tsubgraph cluster_%d_%d {\n" + "\tstyle=\"filled\";\n" + "\tcolor=\"darkgreen\";\n" + "\tfillcolor=\"%s\";\n" + "\tlabel=\"loop %d %s\";\n" + "\tlabeljust=l;\n" + "\tpenwidth=2;\n", + funcdef_no, loop->num, + fillcolors[(loop_depth (loop) - 1) % 3], loop->num, + /* This is only meaningful for loops that have been processed +by Graphite. + +TODO Use can_be_parallel_valid_p? */ + !oacc_get_fn_attrib (cfun->decl) + ? "" + : loop->can_be_parallel ? "(can_be_parallel = true)" + : "(can_be_parallel = false)"); +} for (class loop *inner = loop->inner; inner; inner = inner->next) draw_cfg_nodes_for_loop (pp, funcdef_no, inner); -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 12/22] openacc: Remove unused partitioning in "kernels" regions
With the old "kernels" handling, unparallelized regions would get executed with 1x1x1 partitioning even if the user provided explicit num_gangs, num_workers clauses etc. This commit restores this behavior by removing unused partitioning after assigning the parallelism dimensions to loops. gcc/ChangeLog: * omp-offload.c (oacc_remove_unused_partitioning): New function for removing partitioning that is not used by any loop. (oacc_validate_dims): Call oacc_remove_unused_partitioning and enable warnings about unused partitioning. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust expectations. --- gcc/omp-offload.c | 51 +-- .../acc_prof-kernels-1.c | 19 --- 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index f5cb222efd8c..68cc5a9d9e5d 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1215,6 +1215,39 @@ oacc_parse_default_dims (const char *dims) targetm.goacc.validate_dims (NULL_TREE, oacc_min_dims, -2, 0); } +/* Remove parallelism dimensions below LEVEL which are not set in USED + from DIMS and emit a warning pointing to the location of FN. */ + +static void +oacc_remove_unused_partitioning (tree fn, int *dims, int level, unsigned used) +{ + + bool host_compiler = true; +#ifdef ACCEL_COMPILER + host_compiler = false; +#endif + + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM enumeration. */ + { "gang", "worker", "vector" }; + + char removed_partitions[20] = "\0"; + for (int ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++) +if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] >= 0) + { +if (host_compiler) + { +strcat (removed_partitions, axes[ix]); +strcat (removed_partitions, " "); + } +dims[ix] = -1; + } + if (removed_partitions[0] != '\0') +warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism, +"removed %spartitioning from % region", +removed_partitions); +} + /* Validate and update the dimensions for offloaded FN. ATTRS is the raw attribute. DIMS is an array of dimensions, which is filled in. LEVEL is the partitioning level of a routine, or -1 for an offload @@ -1235,6 +1268,7 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) for (ix = 0; ix != GOMP_DIM_MAX; ix++) { purpose[ix] = TREE_PURPOSE (pos); + tree val = TREE_VALUE (pos); dims[ix] = val ? TREE_INT_CST_LOW (val) : -1; pos = TREE_CHAIN (pos); @@ -1244,14 +1278,15 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) #ifdef ACCEL_COMPILER check = false; #endif + + static char const *const axes[] = + /* Must be kept in sync with GOMP_DIM enumeration. */ + { "gang", "worker", "vector" }; + if (check && warn_openacc_parallelism - && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn)) - && !lookup_attribute ("oacc parallel_kernels_graphite", DECL_ATTRIBUTES (fn))) + && !lookup_attribute ("oacc kernels", DECL_ATTRIBUTES (fn))) { - static char const *const axes[] = - /* Must be kept in sync with GOMP_DIM enumeration. */ - { "gang", "worker", "vector" }; for (ix = level >= 0 ? level : 0; ix != GOMP_DIM_MAX; ix++) if (dims[ix] < 0) ; /* Defaulting axis. */ @@ -1262,14 +1297,20 @@ oacc_validate_dims (tree fn, tree attrs, int *dims, int level, unsigned used) "region contains %s partitioned code but" " is not %s partitioned", axes[ix], axes[ix]); else if (!(used & GOMP_DIM_MASK (ix)) && dims[ix] != 1) + { /* The dimension is explicitly partitioned to non-unity, but no use is made within the region. */ warning_at (DECL_SOURCE_LOCATION (fn), OPT_Wopenacc_parallelism, "region is %s partitioned but" " does not contain %s partitioned code", axes[ix], axes[ix]); + } } + if (lookup_attribute ("oacc parallel_kernels_graphite", + DECL_ATTRIBUTES (fn))) +oacc_remove_unused_partitioning (fn, dims, level, used); + bool changed = targetm.goacc.validate_dims (fn, dims, level, used); /* Default anything left to 1 or a partitioned default. */ diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c index 4a9b11a3d3fe..d398b3463617 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c @@ -7,6 +7,8 @@ #include +/* { dg-skip-if "'kernels' not analyzed by Graphite at -O0" { *-*-* } { "-O0" } { "" } } */
[OG11][committed][PATCH 11/22] openacc: Add further kernels tests
Add some copies of tests to continue covering the old "parloops"-based "kernels" implementation - until it gets removed from GCC - and add further tests for the new Graphite-based implementation. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: New test. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: New test. * c-c++-common/goacc/kernels-decompose-1-parloops.c: New test. * c-c++-common/goacc/kernels-reduction-parloops.c: New test. * c-c++-common/goacc/loop-auto-reductions.c: New test. * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c: New test. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: New test. * gfortran.dg/goacc/kernels-conversion.f95: New test. * gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test. * gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops.f95: New test. * gfortran.dg/goacc/kernels-reductions.f90: New test. --- ...classify-kernels-unparallelized-graphite.c | 41 + ...classify-kernels-unparallelized-parloops.c | 47 ++ .../goacc/kernels-decompose-1-parloops.c | 125 ++ .../goacc/kernels-reduction-parloops.c| 36 .../c-c++-common/goacc/loop-auto-reductions.c | 22 +++ ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++ .../goacc/note-parallelism-kernels-loops-1.c | 61 +++ .../note-parallelism-kernels-loops-parloops.c | 53 ++ ...assify-kernels-unparallelized-parloops.f95 | 44 + .../gfortran.dg/goacc/kernels-conversion.f95 | 52 ++ .../goacc/kernels-decompose-1-parloops.f95| 121 ++ .../goacc/kernels-decompose-parloops-2.f95| 154 ++ .../goacc/kernels-loop-data-parloops-2.f95| 52 ++ .../goacc/kernels-loop-parloops-2.f95 | 45 + .../goacc/kernels-loop-parloops.f95 | 39 + .../gfortran.dg/goacc/kernels-reductions.f90 | 37 + .../parallel-loop-auto-reduction-2.f90| 98 +++ 17 files changed, 1155 insertions(+) create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90 diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c new file mode 100644 index ..77f4524907a9 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c @@ -0,0 +1,41 @@ +/* Check offloaded function's attributes and classification for unparallelized + OpenACC 'kernels' with Graphite kernles handling (default). */ + +/* { dg-additional-options "-O2" } + { dg-additional-options "-fno-openacc-kernels-annotate-loops" } + { dg-additional-options "-fopt-info-optimized-omp" } + { dg-additional-options "-fopt-info-note-omp" } + { dg-additional-options "-fdump-tree-ompexp" } + { dg-additional-options "-fdump-tree-graphite-details" } + { dg-additional-options "-fdump-tree-oaccloops1" } +
[OG11][committed][PATCH 13/22] Add function for printing a single OMP_CLAUSE
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain") changed the dumping behavior for OMP_CLAUSEs. The old behavior is required for a follow-up commit ("openacc: Add data optimization pass") that optimizes single OMP_CLAUSEs. gcc/ChangeLog: * tree-pretty-print.c (print_omp_clause_to_str): Add new function. * tree-pretty-print.h (print_omp_clause_to_str): Add declaration. --- gcc/tree-pretty-print.c | 11 +++ gcc/tree-pretty-print.h | 1 + 2 files changed, 12 insertions(+) diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c index d769cd8f07c5..2e0255176c76 100644 --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -1402,6 +1402,17 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) } } +/* Print the single clause at the top of the clause chain C to a string and + return it. Note that print_generic_expr_to_str prints the whole clause chain + instead. The caller must free the returned memory. */ + +char * +print_omp_clause_to_str (tree c) +{ + pretty_printer pp; + dump_omp_clause (&pp, c, 0, TDF_VOPS|TDF_MEMSYMS); + return xstrdup (pp_formatted_text (&pp)); +} /* Dump chain of OMP clauses. diff --git a/gcc/tree-pretty-print.h b/gcc/tree-pretty-print.h index cafe9aa95989..3368cb9f1544 100644 --- a/gcc/tree-pretty-print.h +++ b/gcc/tree-pretty-print.h @@ -41,6 +41,7 @@ extern void print_generic_expr (FILE *, tree, dump_flags_t = TDF_NONE); extern char *print_generic_expr_to_str (tree); extern void dump_omp_clauses (pretty_printer *, tree, int, dump_flags_t, bool = true); +extern char *print_omp_clause_to_str (tree); extern void dump_omp_atomic_memory_order (pretty_printer *, enum omp_memory_order); extern void dump_omp_loop_non_rect_expr (pretty_printer *, tree, int, -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 15/22] openacc: Add runtime alias checking for OpenACC kernels
From: Andrew Stubbs This commit adds the code generation for the runtime alias checks for OpenACC loops that have been analyzed by Graphite. The runtime alias check condition gets generated in Graphite. It is evaluated by the code generated for the IFN_GOACC_LOOP internal function calls. If aliasing is detected at runtime, the execution dimensions get adjusted to execute the affected loops sequentially. gcc/ChangeLog: * graphite-isl-ast-to-gimple.c: Include internal-fn.h. (graphite_oacc_analyze_scop): Implement runtime alias checks. * omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter to GOACC_LOOP internal calls, and initialise it to integer_one_node. * omp-offload.c (oacc_xform_loop): Integrate the runtime alias check into the GOACC_LOOP expansion. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test. --- gcc/graphite-isl-ast-to-gimple.c | 122 ++ gcc/graphite-scop-detection.c | 18 +- gcc/omp-expand.c | 37 +- gcc/omp-offload.c | 413 ++ .../runtime-alias-check-1.c | 79 .../runtime-alias-check-2.c | 90 6 files changed, 550 insertions(+), 209 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index c516170d9493..bdabe588c3d8 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3. If not see #include "graphite.h" #include "graphite-oacc.h" #include "stdlib.h" +#include "internal-fn.h" struct ast_build_info { @@ -1698,6 +1699,127 @@ graphite_oacc_analyze_scop (scop_p scop) print_isl_schedule (dump_file, scop->original_schedule); } + if (flag_graphite_runtime_alias_checks + && scop->unhandled_alias_ddrs.length () > 0) +{ + sese_info_p region = scop->scop_info; + + /* Usually there will be a chunking loop with the actual work loop +inside it. In some corner cases there may only be one loop. */ + loop_p top_loop = region->region.entry->dest->loop_father; + loop_p active_loop = top_loop->inner ? top_loop->inner : top_loop; + tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, active_loop); + + /* Walk back to GOACC_LOOP block. */ + basic_block goacc_loop_block = region->region.entry->src; + + /* Find the GOACC_LOOP calls. If there aren't any then this is not an +OpenACC kernels loop and will need different handling. */ + gimple_stmt_iterator gsitop = gsi_start_bb (goacc_loop_block); + while (!gsi_end_p (gsitop) +&& (!is_gimple_call (gsi_stmt (gsitop)) +|| !gimple_call_internal_p (gsi_stmt (gsitop)) +|| (gimple_call_internal_fn (gsi_stmt (gsitop)) +!= IFN_GOACC_LOOP))) + gsi_next (&gsitop); + + if (!gsi_end_p (gsitop)) + { + /* Move the GOACC_LOOP CHUNK and STEP calls to after any hoisted +statements. There ought not be any problematic dependencies because +the chunk size and step are only computed for very specific purposes. +They may not be at the very top of the block, but they should be +found together (the asserts test this assuption). */ + gimple_stmt_iterator gsibottom = gsi_last_bb (goacc_loop_block); + gsi_move_after (&gsitop, &gsibottom); + gimple_stmt_iterator gsiinsert = gsibottom; + gcc_checking_assert (is_gimple_call (gsi_stmt (gsitop)) + && gimple_call_internal_p (gsi_stmt (gsitop)) + && (gimple_call_internal_fn (gsi_stmt (gsitop)) + == IFN_GOACC_LOOP)); + gsi_move_after (&gsitop, &gsibottom); + + /* Insert "noalias_p = COND" before the GOACC_LOOP statements. +Note that these likely depend on some of the hoisted statements. */ + tree cond_val = force_gimple_operand_gsi (&gsiinsert, cond, true, NULL, + true, GSI_NEW_STMT); + + /* Insert the cond_val into each GOACC_LOOP call in the region. */ + for (int n = -1; n < (int)region->bbs.length (); n++) + { + /* Cover the region plus goacc_loop_block. */ + basic_block bb = n < 0 ? goacc_loop_block : region->bbs[n]; + + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); + gsi_next (&gsi)) + { + gimpl
[OG11][committed][PATCH 14/22] openacc: Add data optimization pass
From: Andrew Stubbs Address PR90591 "Avoid unnecessary data transfer out of OMP construct", for simple (but common) cases. This commit adds a pass that optimizes data mapping clauses. Currently, it can optimize copy/map(tofrom) clauses involving scalars to copyin/map(to) and further to "private". The pass is restricted "kernels" regions but could be extended to other types of regions. gcc/ChangeLog: * Makefile.in: Add pass. * doc/gimple.texi: TODO. * gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking. * gimple-walk.h (struct walk_stmt_info): Add field. * passes.def: Add new pass. * tree-pass.h (make_pass_omp_data_optimize): New declaration. * omp-data-optimize.cc: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Expect optimization messages. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise. * c-c++-common/goacc/uninit-copy-clause.c: Likewise. * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise. * c-c++-common/goacc/omp_data_optimize-1.c: New test. * g++.dg/goacc/omp_data_optimize-1.C: New test. * gfortran.dg/goacc/omp_data_optimize-1.f90: New test. Co-Authored-By: Thomas Schwinge --- gcc/Makefile.in | 1 + gcc/doc/gimple.texi | 2 + gcc/gimple-walk.c | 15 +- gcc/gimple-walk.h | 6 + gcc/omp-data-optimize.cc | 951 ++ gcc/passes.def| 1 + .../goacc/note-parallelism-1-kernels-loops.c | 7 +- ...note-parallelism-1-kernels-straight-line.c | 9 +- .../goacc/note-parallelism-kernels-loops.c| 10 +- .../c-c++-common/goacc/omp_data_optimize-1.c | 677 + .../c-c++-common/goacc/uninit-copy-clause.c | 6 + .../g++.dg/goacc/omp_data_optimize-1.C| 169 .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 +++ .../gfortran.dg/goacc/uninit-copy-clause.f95 | 2 + gcc/tree-pass.h | 1 + .../kernels-decompose-1.c | 2 + .../libgomp.oacc-fortran/pr94358-1.f90| 4 + 17 files changed, 2444 insertions(+), 7 deletions(-) create mode 100644 gcc/omp-data-optimize.cc create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 4ebdcdbc5f8c..8c02b85d2a96 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1507,6 +1507,7 @@ OBJS = \ omp-low.o \ omp-oacc-kernels-decompose.o \ omp-simd-clone.o \ + omp-data-optimize.o \ opt-problem.o \ optabs.o \ optabs-libfuncs.o \ diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi index 4b3d7d7452e3..a83e17f71a40 100644 --- a/gcc/doc/gimple.texi +++ b/gcc/doc/gimple.texi @@ -2778,4 +2778,6 @@ calling @code{walk_gimple_stmt} on each one. @code{WI} is as in @code{walk_gimple_stmt}. If @code{walk_gimple_stmt} returns non-@code{NULL}, the walk is stopped and the value returned. Otherwise, all the statements are walked and @code{NULL_TREE} returned. + +TODO update for forward vs. backward. @end deftypefn diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c index cd287860994e..66fd491844d7 100644 --- a/gcc/gimple-walk.c +++ b/gcc/gimple-walk.c @@ -32,6 +32,8 @@ along with GCC; see the file COPYING3. If not see /* Walk all the statements in the sequence *PSEQ calling walk_gimple_stmt on each one. WI is as in walk_gimple_stmt. + TODO update for forward vs. backward. + If walk_gimple_stmt returns non-NULL, the walk is stopped, and the value is stored in WI->CALLBACK_RESULT. Also, the statement that produced the value is returned if this statement has not been @@ -44,9 +46,10 @@ gimple * walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, walk_tree_fn callback_op, struct walk_stmt_info *wi) { - gimple_stmt_iterator gsi; + bool forward = !(wi && wi->backward); - for (gsi = gsi_start (*pseq); !gsi_end_p (gsi); ) + gimple_stmt_iterator gsi = forward ? gsi_start (*pseq) : gsi_last (*pseq); + for (; !gsi_end_p (gsi); ) { tree ret = walk_gimple_stmt (&gsi, callback_stmt, callback_op, wi); if (ret) @@ -60,7 +63,13 @@ walk_gimple_seq_mod (gimple_seq *pseq, walk_stmt_fn callback_stmt, } if (!wi->removed_stmt) - gsi_next (&gsi); + { + if (forward) + gsi_next (&gs
[OG11][committed][PATCH 16/22] openacc: Warn about "independent" "kernels" loops with data-dependences
This commit concerns loops in OpenACC "kernels" region that have been marked up with an explicit "independent" clause by the user, but for which Graphite found data dependences. A discussion on the private internal OpenACC mailing list suggested that warning the user about the dependences woud be a more acceptable solution than reverting the user's decision. This behavior is implemented by the present commit. gcc/ChangeLog: * common.opt: Add flag Wopenacc-false-independent. * omp-offload.c (oacc_loop_warn_if_false_independent): New function. (oacc_loop_fixed_partitions): Call from here. --- gcc/common.opt| 5 + gcc/omp-offload.c | 49 +++ 2 files changed, 54 insertions(+) diff --git a/gcc/common.opt b/gcc/common.opt index aa695e56dc48..4c38ed5cf9ab 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -838,6 +838,11 @@ Wtsan Common Var(warn_tsan) Init(1) Warning Warn about unsupported features in ThreadSanitizer. +Wopenacc-false-independent +Common Var(warn_openacc_false_independent) Init(1) Warning +Warn in case a loop in an OpenACC \"kernels\" region has an \"independent\" +clause but analysis shows that it has loop-carried dependences. + Xassembler Driver Separate diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 94a975a88660..b806e36ef515 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -2043,6 +2043,51 @@ oacc_loop_transform_auto_into_independent (oacc_loop *loop) return true; } +/* Emit a warning if LOOP has an "independent" clause but Graphite's + analysis shows that it has data dependences. Note that we respect + the user's explicit decision to parallelize the loop but we + nevertheless warn that this decision could be wrong. */ + +static void +oacc_loop_warn_if_false_independent (oacc_loop *loop) +{ + if (!optimize) +return; + + if (loop->routine) +return; + + /* TODO Warn about "auto" & "independent" in "parallel" regions? */ + if (!oacc_parallel_kernels_graphite_fun_p ()) +return; + + if (!(loop->flags & OLF_INDEPENDENT)) +return; + + bool analyzed = false; + bool can_be_parallel = oacc_loop_can_be_parallel_p (loop, analyzed); + loop_p cfg_loop = oacc_loop_get_cfg_loop (loop); + + if (cfg_loop && cfg_loop->inner && !analyzed) +{ + if (dump_enabled_p ()) + { + const dump_user_location_t loc + = dump_user_location_t::from_location_t (loop->loc); + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "'independent' loop in 'kernels' region has not been " + "analyzed (cf. 'graphite' " + "dumps for more information).\n"); + } + return; +} + + if (!can_be_parallel) +warning_at (loop->loc, 0, +"loop has \"independent\" clause but data dependences were " +"found."); +} + /* Walk the OpenACC loop hierarchy checking and assigning the programmer-specified partitionings. OUTER_MASK is the partitioning this loop is contained within. Return mask of partitioning @@ -2094,6 +2139,10 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask) } } + /* TODO Is this flag needed? Perhaps use -Wopenacc-parallelism? */ + if (warn_openacc_false_independent) +oacc_loop_warn_if_false_independent (loop); + if (maybe_auto && (loop->flags & OLF_INDEPENDENT)) { loop->flags |= OLF_AUTO; -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 17/22] openacc: Handle internal function calls in pass_lim
The loop invariant motion pass correctly refuses to move statements out of a loop if any other statement in the loop is unanalyzable. The pass does not know how to handle the OpenACC internal function calls which was not necessary until recently when the OpenACC device lowering pass was moved to a later position in the pass pipeline. This commit changes pass_lim to ignore the OpenACC internal function calls which do not contain any memory references. The hoisting enabled by this change can be useful for the data-dependence analysis in Graphite; for instance, in the outlined functions for OpenACC regions, all invariant accesses to the ".omp_data_i" struct should be hoisted out of the OpenACC loop. This is particularly important for variables that were scalars in the original loop and which have been turned into accesses to the struct by the outlining process. Not hoisting those can prevent scalar evolution analysis which is crucial for Graphite. Since any hoisting that introduces intermediate names - and hence, "fake" dependences - inside the analyzed nest can be harmful to data-dependence analysis, a flag to restrict the hoisting in OpenACC functions is added to the pass. The pass instance that executes before Graphite now runs with this flag set to true and the pass instance after Graphite runs unrestricted. A more precise way of selecting the statements for which hoisting should be enabled is left for a future improvement. gcc/ChangeLog: * passes.def: Set restrict_oacc_hoisting to true for the early pass_lim instance. * tree-ssa-loop-im.c (movement_possibility): Add restrict_oacc_hoisting flag to function; restrict movement if set. (compute_invariantness): Add restrict_oacc_hoisting flag and pass it on. (gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE calls. (loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and pass it on. (pass_lim::execute): Pass on new flags. * tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration. * gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to loop_invariant_motion_in_fun. --- gcc/gimple-loop-interchange.cc | 2 +- gcc/passes.def | 2 +- gcc/tree-ssa-loop-im.c | 58 -- gcc/tree-ssa-loop-manip.h | 2 +- 4 files changed, 52 insertions(+), 12 deletions(-) diff --git a/gcc/gimple-loop-interchange.cc b/gcc/gimple-loop-interchange.cc index 7b799eca805c..d617438910fd 100644 --- a/gcc/gimple-loop-interchange.cc +++ b/gcc/gimple-loop-interchange.cc @@ -2096,7 +2096,7 @@ pass_linterchange::execute (function *fun) if (changed_p) { unsigned todo = TODO_update_ssa_only_virtuals; - todo |= loop_invariant_motion_in_fun (cfun, false); + todo |= loop_invariant_motion_in_fun (cfun, false, false); scev_reset (); return todo; } diff --git a/gcc/passes.def b/gcc/passes.def index 48c9821011f0..d1dedbc287e2 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -247,7 +247,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_laddress); - NEXT_PASS (pass_lim); + NEXT_PASS (pass_lim, true /* restrict_oacc_hoisting */); NEXT_PASS (pass_walloca, false); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c index 7de47edbcb30..b392ae609aaf 100644 --- a/gcc/tree-ssa-loop-im.c +++ b/gcc/tree-ssa-loop-im.c @@ -47,6 +47,8 @@ along with GCC; see the file COPYING3. If not see #include "builtins.h" #include "tree-dfa.h" #include "dbgcnt.h" +#include "graphite-oacc.h" +#include "internal-fn.h" /* TODO: Support for predicated code motion. I.e. @@ -320,11 +322,23 @@ enum move_pos Otherwise return MOVE_IMPOSSIBLE. */ enum move_pos -movement_possibility (gimple *stmt) +movement_possibility (gimple *stmt, bool restrict_oacc_hoisting) { tree lhs; enum move_pos ret = MOVE_POSSIBLE; + if (restrict_oacc_hoisting && oacc_get_fn_attrib (cfun->decl) + && gimple_code (stmt) == GIMPLE_ASSIGN) +{ + tree rhs = gimple_assign_rhs1 (stmt); + + if (TREE_CODE (rhs) == VIEW_CONVERT_EXPR) + rhs = TREE_OPERAND (rhs, 0); + + if (TREE_CODE (rhs) == ARRAY_REF) + return MOVE_IMPOSSIBLE; +} + if (flag_unswitch_loops && gimple_code (stmt) == GIMPLE_COND) { @@ -974,7 +988,7 @@ rewrite_bittest (gimple_stmt_iterator *bsi) statements. */ static void -compute_invariantness (basic_block bb) +compute_invariantness (basic_block bb, bool restrict_oacc_hoisting) { enum move_pos pos; gimple_stmt_iterator bsi; @@ -1002,7 +1016,7 @@ compute_invariantness (basic_block bb) { stmt = gsi_stmt (bsi); - pos = movement_possibility (stmt); + pos = movement_possibility (stmt, re
[OG11][committed][PATCH 18/22] openacc: Disable pass_pre on outlined functions analyzed by Graphite
The additional dependences introduced by partial redundancy elimination proper and by the code hoisting step of the pass very often cause Graphite to fail on OpenACC functions. On the other hand, the pass can also enable the analysis of OpenACC loops (cf. e.g. the loop-auto-transfer-4.f90 testcase), for instance, because full redundancy elimination removes definitions that would otherwise prevent the creation of runtime alias checks outside of the SCoP. This commit disables the actual partial redundancy elimination step as well as the code hoisting step of pass_pre on OpenACC functions that might be handled by Graphite. gcc/ChangeLog: * tree-ssa-pre.c (insert): Skip any insertions in OpenACC functions that might be processed by Graphite. --- gcc/tree-ssa-pre.c | 17 + 1 file changed, 17 insertions(+) diff --git a/gcc/tree-ssa-pre.c b/gcc/tree-ssa-pre.c index 2aedc31e1d73..b904354e4c78 100644 --- a/gcc/tree-ssa-pre.c +++ b/gcc/tree-ssa-pre.c @@ -51,6 +51,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-dce.h" #include "tree-cfgcleanup.h" #include "alias.h" +#include "graphite-oacc.h" /* Even though this file is called tree-ssa-pre.c, we actually implement a bit more than just PRE here. All of them piggy-back @@ -3736,6 +3737,22 @@ do_hoist_insertion (basic_block block) static void insert (void) { + +/* The additional dependences introduced by the code insertions + can cause Graphite's dependence analysis to fail . Without + special handling of those dependences in Graphite, it seems + better to skip this step if OpenACC loops that need to be handled + by Graphite are found. Note that the full redundancy elimination + step of this pass is useful for the purpose of dependence + analysis, for instance, because it can remove definitions from + SCoPs that would otherwise prevent the creation of runtime alias + checks since those may only use definitions that are available + before the SCoP. */ + + if (oacc_function_p (cfun) + && ::graphite_analyze_oacc_function_p (cfun)) +return; + basic_block bb; FOR_ALL_BB_FN (bb, cfun) -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 19/22] graphite: Tune parameters for OpenACC use
The default values of some parameters that restrict Graphite's resource usage are too low for many OpenACC codes. Furthermore, exceeding the limits does not alwas lead to user-visible diagnostic messages. This commit increases the parameter values on OpenACC functions. The values were chosen to allow for the analysis of all "kernels" regions in the SPEC ACCEL v1.3 benchmark suite. Warnings about exceeded Graphite-related limits are added to the -fopt-info-missed output. Those warnings are phrased in a uniform way that intentionally refers to the "data-dependence analysis" of "OpenACC loops" instead of "a failure in Graphite" to make them easier to understand for users. gcc/ChangeLog: * graphite-optimize-isl.c (optimize_isl): Adjust param_max_isl_operations value for OpenACC functions and add special warnings if value gets exceeded. * graphite-scop-detection.c (build_scops): Likewise for param_graphite_max_arrays_per_scop. gcc/testsuite/ChangeLog: * gcc.dg/goacc/graphite-parameter-1.c: New test. * gcc.dg/goacc/graphite-parameter-2.c: New test. --- gcc/graphite-optimize-isl.c | 35 --- gcc/graphite-scop-detection.c | 28 ++- .../gcc.dg/goacc/graphite-parameter-1.c | 21 +++ .../gcc.dg/goacc/graphite-parameter-2.c | 23 4 files changed, 101 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c index 019452700a49..4eecbd20b740 100644 --- a/gcc/graphite-optimize-isl.c +++ b/gcc/graphite-optimize-isl.c @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "dumpfile.h" #include "tree-vectorizer.h" #include "graphite.h" +#include "graphite-oacc.h" /* get_schedule_for_node_st - Improve schedule for the schedule node. @@ -115,6 +116,14 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite) int old_err = isl_options_get_on_error (scop->isl_context); int old_max_operations = isl_ctx_get_max_operations (scop->isl_context); int max_operations = param_max_isl_operations; + + /* The default value for param_max_isl_operations is easily exceeded + by "kernels" loops in existing OpenACC codes. Raise the values + significantly since analyzing those loops is crucial. */ + if (param_max_isl_operations == 35 /* default value */ + && oacc_function_p (cfun)) +max_operations = 200; + if (max_operations) isl_ctx_set_max_operations (scop->isl_context, max_operations); isl_options_set_on_error (scop->isl_context, ISL_ON_ERROR_CONTINUE); @@ -164,11 +173,27 @@ optimize_isl (scop_p scop, bool oacc_enabled_graphite) dump_user_location_t loc = find_loop_location (scop->scop_info->region.entry->dest->loop_father); if (isl_ctx_last_error (scop->isl_context) == isl_error_quota) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, -"loop nest not optimized, optimization timed out " -"after %d operations [--param max-isl-operations]\n", -max_operations); - else + { + if (oacc_function_p (cfun)) + { + /* Special casing for OpenACC to unify diagnostic messages +here and in graphite-scop-detection.c. */ + dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "data-dependence analysis of OpenACC loop " + "nest " + "failed; try increasing the value of " + "--param=" + "max-isl-operations=%d.\n", + max_operations); +} + else +dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, + "loop nest not optimized, optimization timed " + "out after %d operations [--param " + "max-isl-operations]\n", + max_operations); +} + else dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc, "loop nest not optimized, ISL signalled an error\n"); } diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 8b41044bce5e..afc955cc97eb 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -2056,6 +2056,9 @@ determine_openacc_reductions (scop_p scop) } } + +extern dump_user_location_t find_loop_location (class loop *); + /* Find Static Control Parts (SCoP) in the current function and pushes them to SCOPS. */ @@ -2109,6 +2112,11 @@ build_scops (vec *scops)
[OG11][committed][PATCH 20/22] graphite: Adjust scop loop-nest choice
The find_common_loop function is used in Graphite to obtain a common super-loop of all loops inside a SCoP. The function is applied to the loop of the destination block of the edge that leads into the SESE region and the loop of the source block of the edge that exits the region. The exit block is usually introduced by the canonicalization of the loop structure that Graphite does to support its code generation. If it is empty, it may happen that it belongs to the outer fake loop. This way, build_alias_set may end up analysing data-references with respect to this loop although there may exist a proper super-loop of the SCoP loops. This does not seem to be correct in general and it leads to problems with runtime alias check creation which fails if executed on a loop without niter information. gcc/ChangeLog: * graphite-scop-detection.c (scop_context_loop): New function. (build_alias_set): Use scop_context_loop instead of find_common_loop. * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise. * graphite.h (scop_context_loop): New declaration. --- gcc/graphite-isl-ast-to-gimple.c | 4 +--- gcc/graphite-scop-detection.c| 21 ++--- gcc/graphite.h | 1 + 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c index bdabe588c3d8..ec055a358f39 100644 --- a/gcc/graphite-isl-ast-to-gimple.c +++ b/gcc/graphite-isl-ast-to-gimple.c @@ -1543,9 +1543,7 @@ graphite_regenerate_ast_isl (scop_p scop) conditional if aliasing can be ruled out at runtime and the original version of the SCoP, otherwise. */ - loop_p loop - = find_common_loop (scop->scop_info->region.entry->dest->loop_father, - scop->scop_info->region.exit->src->loop_father); + loop_p loop = scop_context_loop (scop); tree cond = generate_alias_cond (scop->unhandled_alias_ddrs, loop); tree non_alias_cond = build1 (TRUTH_NOT_EXPR, boolean_type_node, cond); set_ifsese_condition (region->if_region, non_alias_cond); diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index afc955cc97eb..99e906a5d120 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -297,6 +297,23 @@ single_pred_cond_non_loop_exit (basic_block bb) return NULL; } + +/* Return the innermost loop that encloses all loops in SCOP. */ + +loop_p +scop_context_loop (scop_p scop) +{ + edge scop_entry = scop->scop_info->region.entry; + edge scop_exit = scop->scop_info->region.exit; + basic_block exit_bb = scop_exit->src; + + while (sese_trivially_empty_bb_p (exit_bb) && single_pred_p (exit_bb)) +exit_bb = single_pred (exit_bb); + + loop_p entry_loop = scop_entry->dest->loop_father; + return find_common_loop (entry_loop, exit_bb->loop_father); +} + namespace { @@ -1776,9 +1793,7 @@ build_alias_set (scop_p scop) int i, j; int *all_vertices; - struct loop *nest -= find_common_loop (scop->scop_info->region.entry->dest->loop_father, - scop->scop_info->region.exit->src->loop_father); + struct loop *nest = scop_context_loop (scop); gcc_checking_assert (nest); diff --git a/gcc/graphite.h b/gcc/graphite.h index 9c508f31109f..dacb27a9073c 100644 --- a/gcc/graphite.h +++ b/gcc/graphite.h @@ -480,4 +480,5 @@ extern tree cached_scalar_evolution_in_region (const sese_l &, loop_p, tree); extern void dot_all_sese (FILE *, vec &); extern void dot_sese (sese_l &); extern void dot_cfg (); +extern loop_p scop_context_loop (scop_p); #endif -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[OG11][committed][PATCH 21/22] graphite: Accept loops without data references
It seems that the check that rejects loops without data references is only included to avoid handling non-profitable loops. Including those loops in Graphite's analysis enables more consistent diagnostic messages in OpenACC "kernels" code and does not introduce any testsuite regressions. If executing Graphite on loops without data references leads to noticeable compile time slow-downs for non-OpenACC users of Graphite, the check can be re-introduced but restricted to non-OpenACC functions. gcc/ChangeLog: * graphite-scop-detection.c (scop_detection::harmful_loop_in_region): Remove check for loops without data references. --- gcc/graphite-scop-detection.c | 13 - 1 file changed, 13 deletions(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index 99e906a5d120..9311a0e42a57 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -851,19 +851,6 @@ scop_detection::harmful_loop_in_region (sese_l scop) const return true; } - /* Check if all loop nests have at least one data reference. -??? This check is expensive and loops premature at this point. -If important to retain we can pre-compute this for all innermost -loops and reject those when we build a SESE region for a loop -during SESE discovery. */ - if (! loop->inner - && ! loop_nest_has_data_refs (loop)) - { - DEBUG_PRINT (dp << "[scop-detection-fail] loop_" << loop->num - << " does not have any data reference.\n"); - return true; - } - DEBUG_PRINT (dp << "[scop-detection] loop_" << loop->num << " is harmless.\n"); } -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Re: [PATCH 15/40] graphite: Extend SCoP detection dump output
Hi Richard, On Tue, 2022-05-17 at 08:21 +, Richard Biener wrote: > On Mon, 16 May 2022, Tobias Burnus wrote: > > > As requested by Richard: Rediffed patch. > > > > Changes: s/.c/.cc/ + some whitespace changes. > > (At least in my email reader, some were lost. I also fixed > > too-long line > > issues.) > > > > In addition, FOR_EACH_LOOP was replaced by 'for (auto loop : ...' > > (macro was removed late in GCC 12 development ? r12-2605- > > ge41ba804ba5f5c) > > > > Otherwise, it should be identical to Frederik's patch, earlier in > > this thread. > > > > On 15.12.21 16:54, Frederik Harwath wrote: > > > Extend dump output to make understanding why Graphite rejects to > > > include a loop in a SCoP easier (for GCC developers). > > > > OK for mainline? > > + if (printed) > +fprintf (file, "\b\b"); > > please find other means of omitting ", ", like by printing it > _before_ the number but only for the second and following loop > number. Done. > > I'll also note that > > +static void > +print_sese_loop_numbers (FILE *file, sese_l sese) > +{ > + bool printed = false; > + for (auto loop : loops_list (cfun, 0)) > +{ > + if (loop_in_sese_p (loop, sese)) > + fprintf (file, "%d, ", loop->num); > + printed = true; > +} > > is hardly optimal. Please instead iterate over > sese.entry->dest->loop_father and children instead which you can do > by passing that as extra argument to loops_list. Done. This had to be extended a little bit, because a SCoP can consist of consecutive loop-nests and iterating only over "loops_list (cfun, LI_INCLUDE_ROOT, sese.entry->dest- >loop_father))" would output only the loops from the first loop-nest in the SCoP (cf. the test file scop-22a.c that I added). > > + > + if (dump_file && dump_flags & TDF_DETAILS) > +{ > + fprintf (dump_file, "Loops in SCoP: "); > + for (auto loop : loops_list (cfun, 0)) > + if (loop_in_sese_p (loop, s)) > + fprintf (dump_file, "%d ", loop->num); > + fprintf (dump_file, "\n"); > +} > > you are duplicating functionality of the function you just added ... > Fixed. > Otherwise looks OK to me. Can I commit the revised patch? Thanks for your review, Frederik - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From fb268a37704b1598a84051c735514ff38adad038 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Wed, 18 May 2022 07:59:42 +0200 Subject: [PATCH] graphite: Extend SCoP detection dump output Extend dump output to make understanding why Graphite rejects to include a loop in a SCoP easier (for GCC developers). gcc/ChangeLog: * graphite-scop-detection.cc (scop_detection::can_represent_loop): Output reason for failure to dump file. (scop_detection::harmful_loop_in_region): Likewise. (scop_detection::graphite_can_represent_expr): Likewise. (scop_detection::stmt_has_simple_data_refs_p): Likewise. (scop_detection::stmt_simple_for_scop_p): Likewise. (print_sese_loop_numbers): New function. (scop_detection::add_scop): Use from here. gcc/testsuite/ChangeLog: * gcc.dg/graphite/scop-22a.c: New test. --- gcc/graphite-scop-detection.cc | 184 --- gcc/testsuite/gcc.dg/graphite/scop-22a.c | 56 +++ 2 files changed, 219 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/graphite/scop-22a.c diff --git a/gcc/graphite-scop-detection.cc b/gcc/graphite-scop-detection.cc index 8c0ee9975579..9792d87ee0ae 100644 --- a/gcc/graphite-scop-detection.cc +++ b/gcc/graphite-scop-detection.cc @@ -69,12 +69,27 @@ public: fprintf (output.dump_file, "%d", i); return output; } + friend debug_printer & operator<< (debug_printer &output, const char *s) { fprintf (output.dump_file, "%s", s); return output; } + + friend debug_printer & + operator<< (debug_printer &output, gimple* stmt) + { +print_gimple_stmt (output.dump_file, stmt, 0, TDF_VOPS | TDF_MEMSYMS); +return output; + } + + friend debug_printer & + operator<< (debug_printer &output, tree t) + { +print_generic_expr (output.dump_file, t, TDF_SLIM); +return output; + } } dp; #define DEBUG_PRINT(args) do \ @@ -506,6 +521,27 @@ scop_detection::merge_sese (sese_l first, sese_l second) const return combined; } +/* Print the loop numbers of the loops contained in SESE to FILE. */ + +static void +p
Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses
On 24.10.19 16:31, Thomas Schwinge wrote: Hi, I have attached a revised patch. [...] I was wondering if the way in which the patch avoids issuing errors about operator switches more than once by modifying the clauses (cf. the corresponding comment in omp-low.c) could lead to problems [...] "Patching up" erroneous state or even completely removing OMP clauses is -- as far as I understand -- acceptable to avoid "issuing errors about operator switches more than once". This doesn't affect code generation, because no code will be generated at all. (Does that answer your question?) Yes, thank you. Regarding my suggestions to "demote error to warning diagnostics", I'd suggest that at this point we do *not* try to fix for the user any presumed wrong/missing 'reduction' clauses (difficult/impossible to do correctly in the general case), but really only diagnose them. Ok, I have changed the errors into warnings and I have removed the code for avoiding repeated messages. So just C/C++ testing, no Fortran at all. This is not ideal, but probably (hopefully) acceptable given that this is working on the middle end representation shared between all front ends. Thanks to Tobias, we now also have Fortran tests. To match the order in 'struct omp_context' (see above), move these new initializations before those of 'ctx->depth'. (Even if that also just achieves "some local consistency".) ;-) Done. @@ -1131,6 +1141,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_REDUCTION: case OMP_CLAUSE_IN_REDUCTION: + if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx)) + ctx->local_reduction_clauses + = tree_cons (NULL, c, ctx->local_reduction_clauses); decl = OMP_CLAUSE_DECL (c); if (TREE_CODE (decl) == MEM_REF) { I think this should really only apply to 'OMP_CLAUSE_REDUCTION' but not > 'OMP_CLAUSE_IN_REDUCTION' (please verify)? Right, I have moved the new code to the OMP_CLAUSE_REDUCTION case above. I'm usually the last one to complain about such things ;-) -- but here really the indentation of the new code seems to be off? Please verify. Maybe you had set a tab-stop to four spaces instead of eight? Oh, it should look better now. --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c Rename to '*-warn.c', and instead of 'dg-error' use 'dg-warning' (possibly more than currently). Ok. --- a/gcc/testsuite/c-c++-common/goacc/reduction-6.c +++ b/gcc/testsuite/c-c++-common/goacc/reduction-6.c @@ -16,17 +16,6 @@ int foo (int N) } } - #pragma acc parallel - { -#pragma acc loop reduction(+:b) -for (int i = 0; i < N; i++) - { -#pragma acc loop - for (int j = 0; j < N; j++) - b += 1; - } - } - #pragma acc parallel { #pragma acc loop reduction(+:c) That one stays in, but gets a 'dg-warning'. What warning would you expect to see here? I do not get any warnings. Best regards, Frederik >From 22f45d4c2c11febce171272f9289c487aed4f9d7 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 29 Oct 2019 12:39:23 +0100 Subject: [PATCH] Warn about inconsistent OpenACC nested reduction clauses MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause"; this was first clarified by OpenACC 2.6) requires that, if a variable is used in reduction clauses on two nested loops, then there must be reduction clauses for that variable on all loops that are nested in between the two loops and all these reduction clauses must use the same operator. This commit introduces a check for that property which reports warnings if it is violated. In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous reductions on variable b; adding a reduction clause to make it compile cleanly would make it a duplicate of the test for variable c. 2019-10-29 Gergö Barany Tobias Burnus Frederik Harwath Thomas Schwinge gcc/ * omp-low.c (struct omp_context): New fields local_reduction_clauses, outer_reduction_clauses. (new_omp_context): Initialize these. (scan_sharing_clauses): Record reduction clauses on OpenACC constructs. (scan_omp_for): Check reduction clauses for incorrect nesting. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-warn.c: New test. * c-c++-common/goacc/nested-reductions.c: New test. * c-c++-common/goacc/reduction-6.c: Adjust. * gfortran.dg/goacc/nested-reductions-warn.f90: New test. * gfortran.dg/goacc/nested-reductions.f90: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: Add missing red
Re: Add OpenACC 2.6 `acc_get_property' support
onsider (PGI?). The standard does not impose any restrictions on the format of the string. > > +default: > > + break; > > Should this 'GOMP_PLUGIN_error' or even 'GOMP_PLUGIN_fatal'? (Similar > then elsewhere.) Yes, I chose GOMP_PLUGIN_error. > > --- /dev/null > > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc-get-property.c > > @@ -0,0 +1,37 @@ > > +/* Test the `acc_get_property' and '`acc_get_property_string' library > > + functions. */ > > +/* { dg-do run } */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +int main () > > +{ > > + const char *s; > > + size_t v; > > + int r; > > + > > + /* Verify that the vendor is a proper non-empty string. */ > > + s = acc_get_property_string (0, acc_device_default, acc_property_vendor); > > + r = !s || !strlen (s); > > + if (s) > > +printf ("OpenACC vendor: %s\n", s); > > Should we check the actual string returned, as defined by OpenACC/our > implementation, as applicable? Use '#if defined ACC_DEVICE_TYPE_[...]'. > (See 'libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c', > for example.) Yes. > Isn't this the "Device vendor" instead of the "OpenACC vendor"? Similar > for all other 'printf's? Yes. > These tests only use 'acc_device_default', should they also check other > valid as well as invalid values? That would be better. Frederik
Re: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses
Hi Thomas, On 05.11.19 15:22, Thomas Schwinge wrote: > For your convenience, I'm attaching an incremental patch, to be merged > into yours.> [...]> With that addressed, OK for trunk. Thank you. I have merged the patches and committed. > A few more comments to address separately, later on. I will look into your remaining questions. Best regards, Frederik
[PATCH] Add OpenACC 2.6 `serial' construct support
Hi, this patch implements the OpenACC 2.6 "serial" construct. It has been tested by running the testsuite with nvptx-none offloading on x86_64-pc-linux-gnu. Best regards, Frederik 8< --- The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard) is equivalent to a `parallel' construct with clauses `num_gangs(1) num_workers(1) vector_length(1)' implied. These clauses are therefore not supported with the `serial' construct. All the remaining clauses accepted with `parallel' are also accepted with `serial'. The `serial' construct is implemented like `parallel', except for hardcoding dimensions rather than taking them from the relevant clauses, in `expand_omp_target'. Separate codes are used to denote the `serial' construct throughout the middle end, even though the mapping of `serial' to an equivalent `parallel' construct could have been done in the individual language frontends. In particular, this allows to distinguish between `parallel' and `serial' in warnings, error messages, dumps etc. 2019-11-07 Maciej W. Rozycki Tobias Burnus Frederik Harwath gcc/ * gimple.h (gf_mask): Add GF_OMP_TARGET_KIND_OACC_SERIAL enumeration constant. (is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (is_gimple_omp_offloaded): Likewise. * gimplify.c (omp_region_type): Add ORT_ACC_SERIAL enumeration constant. Adjust the value of ORT_NONE accordingly. (is_gimple_stmt): Handle OACC_SERIAL. (oacc_default_clause): Handle ORT_ACC_SERIAL. (gomp_needs_data_present): Likewise. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_workshare): Handle OACC_SERIAL. (gimplify_expr): Likewise. * omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter. * omp-expand.c (expand_omp_target): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (build_omp_regions_1, omp_make_gimple_edges): Likewise. * omp-low.c (is_oacc_parallel): Rename function to... (is_oacc_parallel_or_serial): ... this. Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (scan_sharing_clauses): Adjust accordingly. (scan_omp_for): Likewise. (lower_oacc_head_mark): Likewise. (convert_from_firstprivate_int): Likewise. (lower_omp_target): Likewise. (check_omp_nesting_restrictions): Handle GF_OMP_TARGET_KIND_OACC_SERIAL. (lower_oacc_reductions): Likewise. (lower_omp_target): Likewise. * tree.def (OACC_SERIAL): New tree code. * tree-pretty-print.c (dump_generic_node): Handle OACC_SERIAL. * doc/generic.texi (OpenACC): Document OACC_SERIAL. gcc/c-family/ * c-pragma.h (pragma_kind): Add PRAGMA_OACC_SERIAL enumeration constant. * c-pragma.c (oacc_pragmas): Add "serial" entry. gcc/c/ * c-parser.c (OACC_SERIAL_CLAUSE_MASK): New macro. (c_parser_oacc_kernels_parallel): Rename function to... (c_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL. (c_parser_omp_construct): Update accordingly. gcc/cp/ * constexpr.c (potential_constant_expression_1): Handle OACC_SERIAL. * parser.c (OACC_SERIAL_CLAUSE_MASK): New macro. (cp_parser_oacc_kernels_parallel): Rename function to... (cp_parser_oacc_compute): ... this. Handle PRAGMA_OACC_SERIAL. (cp_parser_omp_construct): Update accordingly. (cp_parser_pragma): Handle PRAGMA_OACC_SERIAL. Fix alphabetic order. * pt.c (tsubst_expr): Handle OACC_SERIAL. gcc/fortran/ * gfortran.h (gfc_statement): Add ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL and ST_OACC_END_SERIAL enumeration constants. (gfc_exec_op): Add EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL enumeration constants. * match.h (gfc_match_oacc_serial): New prototype. (gfc_match_oacc_serial_loop): Likewise. * dump-parse-tree.c (show_omp_node, show_code_node): Handle EXEC_OACC_SERIAL_LOOP and EXEC_OACC_SERIAL. * match.c (match_exit_cycle): Handle EXEC_OACC_SERIAL_LOOP. * openmp.c (OACC_SERIAL_CLAUSES): New macro. (gfc_match_oacc_serial_loop): New function. (gfc_match_oacc_serial): Likewise. (oacc_is_loop): Handle EXEC_OACC_SERIAL_LOOP. (resolve_omp_clauses): Handle EXEC_OACC_SERIAL. (oacc_code_to_statement): Handle EXEC_OACC_SERIAL and EXEC_OACC_SERIAL_LOOP. (gfc_resolve_oacc_directive): Likewise. * parse.c (decode_oacc_directive) <'s'>: Add case for "serial" and "serial loop". (next_statement): Handle ST_OACC_SERIAL
Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses
Hi Jakub, On 06.11.19 14:00, Jakub Jelinek wrote: > On Wed, Nov 06, 2019 at 01:41:47PM +0100, frede...@codesourcery.com wrote: >> --- a/gcc/omp-low.c >> +++ b/gcc/omp-low.c >> @@ -128,6 +128,12 @@ struct omp_context >> [...] >> + /* A tree_list of the reduction clauses in this context. */ >> + tree local_reduction_clauses; >> + >> + /* A tree_list of the reduction clauses in outer contexts. */ >> + tree outer_reduction_clauses; > > Could there be acc in the name to make it clear it is OpenACC only? Yes, will be added. >> @@ -910,6 +916,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) >> [...] >> + ctx->local_reduction_clauses = NULL; >> [...] >> @@ -925,6 +933,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) >> [...] >> + ctx->local_reduction_clauses = NULL; >> + ctx->outer_reduction_clauses = NULL; > > The = NULL assignments are unnecessary in all 3 cases, ctx is allocated with > XCNEW. Ok, will be removed. >> @@ -1139,6 +1149,11 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) >>goto do_private; >> >> case OMP_CLAUSE_REDUCTION: >> + if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx)) >> +ctx->local_reduction_clauses >> + = tree_cons (NULL, c, ctx->local_reduction_clauses); > > I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be > more natural, wouldn't it. Yes. > Or, wouldn't it be better to do this checking in the gimplifier instead of > omp-low.c? There we have splay trees with GOVD_REDUCTION etc. for the > variables, so it wouldn't be O(#reductions^2) compile time> It is true that > the gimplifier doesn't record the reduction codes (after > all, OpenMP has UDRs and so there can be fairly arbitrary reductions). Right, I have considered moving the implementation somewhere else before. I am going to look into this, but perhaps we will just keep it where it is if otherwise the implementation becomes more complicated. > Consider million reduction clauses on nested loops. > If gimplifier is not the right spot, then use a splay tree + vector instead? > splay tree for the outer ones, vector for the local ones, and put into both > the clauses, so you can compare reduction code etc. Sounds like a good idea. I am going to try that. However, I have not seen the suboptimal data structure choices of the original patch as a problem, since the case of million reduction clauses has not occurred to me. Thank you for your feedback! Best regards, Frederik
Re: [PATCH 5/7] Remove last leftover usage of params* files.
Hi Martin, On 06.11.19 13:40, Martin Liska wrote: > (finalize_options_struct): Remove. This patch has been committed by now, but it seems that a single use of finalize_options_struct has been overlooked in gcc/tree-streamer-in.c. Best regards, Frederik
Move pass_oacc_device_lower after pass_graphite
Hi, as a first step towards enabling the use of Graphite for optimizing OpenACC loops this patch moves the OpenACC device lowering after the Graphite pass. This means that the device lowering now takes place after some crucial optimization passes. Thus new instances of those passes are added inside of a new pass pass_oacc_functions which ensures that they run on OpenACC functions only. The choice of the new position for pass_oacc_device_lower is further constrainted by the need to execute it before pass_vectorize. This means that pass_oacc_device_lower now runs inside of pass_tree_loop. A further instance of the pass that handles functions without loops is added inside of pass_tree_no_loop. Yet another pass instance that executes if optimizations are disabled is included inside of a new pass_no_optimizations. The patch has been bootstrapped on x86_64-linux-gnu and tested with the GCC testsuite and with the libgomp testsuite with nvptx and gcn offloading. The patch should have no impact on non-OpenACC user code. However the new pass instances have changed the pass instance numbering and hence the dump scanning commands in several tests had to be adjusted. I hope that I found all that needed adjustment, but it is well possible that I missed some tests that execute for particular targets or non-default languages only. The resulting UNRESOLVED tests are usually easily fixed by appending a pass number to the name of a pass that previously had no number (e.g. "cunrolli" becomes "cunrolli1") or by incrementing the pass number (e.g. "dce6" becomes "dce7") in a dump scanning command. The patch leads to several new unresolved tests in the libgomp testsuite which are caused by the combination of torture testing, missing cleanup of the offload dump files, and the new pass numbering. If a test that uses, for instance, "-foffload=fdump-tree-oaccdevlow" gets compiled with "-O0" and afterwards with "-O2", each run of the test executes different instances of pass_oacc_device_lower and produces dumps whose names differ only in the pass instance number. The dump scanning command in the second run fails, because the dump files do not get removed after the first run and the command consequently matches two different dump files. This seems to be a known issue. I am going to submit a patch that implements the cleanup of the offload dumps soon. I have tried to rule out performance regressions by running different benchmark suites with nvptx and gcn offloading. Nevertheless, I think that it makes sense to keep an eye on OpenACC performance in the close future and revisit the optimizations that run on the device lowered function if necessary. Ok to include the patch in master? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 93fb166876a0540416e19c9428316d1370dd1e1b Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 3 Nov 2020 12:58:37 +0100 Subject: [PATCH] Move pass_oacc_device_lower after pass_graphite As a first step towards enabling the use of Graphite for optimizing OpenACC loops, the OpenACC device lowering must be moved after the Graphite pass. This means that the device lowering now takes place after some crucial optimization passes. Thus new instances of those passes are added inside of a new pass pass_oacc_functions which ensures that they execute on OpenACC functions only. The choice of the new position for pass_oacc_device_lower is further constrainted by the need to execute it before pass_vectorize. This means that pass_oacc_device_lower now runs inside of pass_tree_loop. A further instance of the pass that handles functions without loops is added inside of pass_tree_no_loop. Yet another pass instance that executes if optimizations are disabled is included inside of a new pass_no_optimizations. 2020-11-03 Frederik Harwath Thomas Schwinge gcc/ChangeLog: * omp-general.c (oacc_get_fn_dim_size): Adapt. * omp-offload.c (pass_oacc_device_lower::clone) : New method. * passes.c (class pass_no_optimizations): New pass. (make_pass_no_optimizations): New static function. * passes.def: Move pass_oacc_device_lower into pass_tree_loop and add further instances to pass_tree_no_loop and to new pass pass_no_optimizations. Add new instances of pass_lower_complex, pass_ccp, pass_sink_code, pass_complete_unrolli, pass_backprop, pass_phiprop, pass_forwprop, pass_vrp, pass_dce, pass_loop_done, pass_loop_init, pass_fix_loops supporting the pass_oacc_device_lower instance in pass_tree_loop. * tree-pass.h (make_pass_oacc_functions): New static function. (make_pass_oacc_functions): New static function. * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New method. (pass_complete_unrolli::clone): New method. * tree-ssa-loop.c (pass
[PATCH] testsuite: Clean up lto and offload dump files
Hi, Dump files produced from an offloading compiler through "-foffload=-fdump-..." do not get removed by gcc-dg.exp and other exp-files of the testsuite that use the cleanup code from this file (e.g. libgomp). This can lead to problems if scan-dump detects leftover dumps from previous runs of a test case. This patch adapts the existing cleanup logic for "-flto" to handle "-flto" and "-foffload" in a uniform way. The glob pattern that is used for matching the "ltrans" files is also changed since the existing pattern failed to remove some LTO ("ltrans0.ltrans.") dump files. This patch gets rid of at least one unresolved libgomp test result that would otherwise be introduced by the patch discussed in this thread: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557889.html diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp index e8ad3052657..e0560af205f 100644 --- a/gcc/testsuite/lib/gcc-dg.exp +++ b/gcc/testsuite/lib/gcc-dg.exp @@ -194,31 +194,47 @@ proc schedule-cleanups { opts } { [...] - lappend tfiles "$stem.{$basename_ext,exe}" I do not understand why "exe" should be included here. I have removed it and I did not notice any files matching the resultig pattern being left back by "make check-gcc". Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 9eb5da60e8822e1f6fa90b32bff6123ed62c146c Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Wed, 4 Nov 2020 14:09:46 +0100 Subject: [PATCH] testsuite: Clean up lto and offload dump files Dump files produced from an offloading compiler through "-foffload=-fdump-..." do not get removed by gcc-dg.exp and other exp-files of the testsuite that use the cleanup code from this file (e.g. libgomp). This can lead to problems if scan-dump detects leftover dumps from previous runs of a test case. This patch adapts the existing cleanup logic for "-flto" to handle "-flto" and "-foffload" in a uniform way. The glob pattern that is used for matching the "ltrans" files is also changed since the existing pattern failed to match some dump files. 2020-11-04 Frederik Harwath gcc/testsuite/ChangeLog: * lib/gcc-dg.exp (proc schedule-cleanups): Adapt "-flto" handling, add "-foffload" handling. --- gcc/testsuite/lib/gcc-dg.exp | 50 1 file changed, 33 insertions(+), 17 deletions(-) diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp index e8ad3052657..e0560af205f 100644 --- a/gcc/testsuite/lib/gcc-dg.exp +++ b/gcc/testsuite/lib/gcc-dg.exp @@ -194,31 +194,47 @@ proc schedule-cleanups { opts } { # stem.ext.. # (tree)passes can have multiple instances, thus optional trailing * set ptn "\[0-9\]\[0-9\]\[0-9\]$ptn.*" +set ltrans no +set mkoffload no + # Handle ltrans files around -flto if [regexp -- {(^|\s+)-flto(\s+|$)} $opts] { verbose "Cleanup -flto seen" 4 - set ltrans "{ltrans\[0-9\]*.,}" -} else { - set ltrans "" + set ltrans yes +} + +if [regexp -- {(^|\s+)-foffload=} $opts] { + verbose "Cleanup -foffload seen" 4 + set mkoffload yes } -set ptn "$ltrans$ptn" + verbose "Cleanup final ptn: $ptn" 4 set tfiles {} foreach src $testcases { - set basename [file tail $src] - if { $ltrans != "" } { - # ??? should we use upvar 1 output_file instead of this (dup ?) - set stem [file rootname $basename] - set basename_ext [file extension $basename] - if {$basename_ext != ""} { - regsub -- {^.*\.} $basename_ext {} basename_ext - } - lappend tfiles "$stem.{$basename_ext,exe}" - unset basename_ext - } else { - lappend tfiles $basename - } +set basename [file tail $src] +set stem [file rootname $basename] +set basename_ext [file extension $basename] +if {$basename_ext != ""} { +regsub -- {^.*\.} $basename_ext {} basename_ext +} +set extensions [list $basename_ext] + +if { $ltrans == yes } { +lappend extensions "ltrans\[0-9\]*.ltrans" +} +if { $mkoffload == yes} { +# The * matches the offloading target's name, e.g. "xnvptx-none". +lappend extensions "*.mkoffload" +} + +set extensions_ptn [join $extensions ","] +if { [llength $extensions] > 1 } { +set extensions_ptn "{$extensions_ptn}" +} + + lappend tfiles "$stem.$extensions_ptn" } + if { [llength $tfiles] > 1 } { set tfiles [join $tfiles ","] set tfiles "{$tfiles}" -- 2.17.1
Re: Move pass_oacc_device_lower after pass_graphite
Hi Richard, Richard Biener writes: > On Tue, Nov 3, 2020 at 4:31 PM Frederik Harwath > What's on my TODO list (or on the list of things to explore) is to make > the dump file names/suffixes explicit in passes.def like via > > NEXT_PASS (pass_ccp, true /* nonzero_p */, "oacc") > > and we'd get a dump named .ccp_oacc or so. That would be very helpful for avoiding the drudgery of adapting those pass numbers! > Now, what does oacc_device_lower actually do that you need to > re-run complex lowering? What does cunrolli do at this point that > the complete_unroll pass later does not do? > Good spot, "cunrolli" seems to be unnecessary. The complex lowering is necessary to handle the code that gets created by the OpenACC reduction lowering during oaccdevlow. I have attached a test case (a reduced version of libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt.c) which shows that the complex instructions are created by pass_oacc_device_lower and which leads to an ICE if compiled without the new complex lowering instance ("-foffload=-fdisable-tree-cplxlower2"). The problem is an unlowered addition. This is from a diff of the dump of the pass following oaccdevlow1 (ccp4) with disabled and with enabled tree-cplxlower2: < _91 = VIEW_CONVERT_EXPR(_1); < _92 = reduction_var_2 + _91; --- > _104 = REALPART_EXPR (_1)>; > _105 = IMAGPART_EXPR (_1)>; > _91 = COMPLEX_EXPR <_104, _105>; > _106 = reduction_var$real_100 + _104; > _107 = reduction_var$imag_101 + _105; > _92 = COMPLEX_EXPR <_106, _107>; > What's special about oacc_device lower that doesn't also apply > to omp_device_lower? The passes do different things. The goal is to optimize OpenACC loops using Graphite. The relevant lowering of the internal OpenACC function calls happens in pass_oacc_device_lower. > Is all this targeted at code compiled exclusively for the offload > target? Thus we're in lto1 here? The OpenACC outlined functions also get compiled for the host. > Does it make eventually more sense to have a completely custom pass > pipeline for the offload compilation? Maybe even per offload target? > See how we have a custom pipeline for -Og (pass_all_optimizations_g). What would be the main benefits of a separate pipeline? Avoiding (re-)running passes unneccessarily, less unwanted interactions in the test suite (but your suggestion above regarding the fixed pass names would also solve this)? >> Ok to include the patch in master? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c new file mode 100644 index 000..6879e5aaf25 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-lowering.c @@ -0,0 +1,50 @@ +/* { dg-additional-options "-foffload=-fdump-tree-cplxlower2" } */ +/* { dg-additional-options "-foffload=-fdump-tree-oaccdevlow1" } */ +/* { dg-do link } */ +/* { dg-skip-if "" { *-*-* } { "-O0" } {""} } */ + +#include +#if !defined(__hppa__) || !defined(__hpux__) +#include +#endif + +#define N 100 + +static float _Complex __attribute__ ((noinline)) +sum (float _Complex ary[N]) +{ + float _Complex reduction_var = 0; +#pragma acc parallel loop gang reduction(+:reduction_var) + for (int ix = 0; ix < N; ix++) +reduction_var += ary[ix]; + + return reduction_var; +} + +int main (void) +{ + float _Complex ary[N]; + float _Complex result; + + for (int ix = 0; ix < N; ix++) +{ + float frac = ix * (1.0f / 1024) + 1.0f; + ary[ix] = frac + frac * 2.0j - 1.0j; +} + + result = sum (ary); + printf("%.1f%+.1fi\n", creal(result), cimag(result)); + return 0; +} + +/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR" 1 "oaccdevlow1" } } + + There is just one COMPLEX_EXPR right before oaccdevlow1 ...*/ + +/* { dg-final { scan-offload-tree-dump-times "GOACC_REDUCTION .*?reduction_var.*?;" 4 "oaccdevlow1" } } + + ... but several IFN_GOACC_REDUCTION calls for the reduction variable which are subsequently lowered ... */ + +/* { dg-final { scan-offload-tree-dump-times "COMPLEX_EXPR " 4 "cplxlower2" } } + + ... which introduces new COMPLEX_EXPRs. */
[PATCH 0/2] Use Graphite for OpenACC "kernels" regions
Hi, the two following patches implement a new handling of the loops in OpenACC "kernels" regions which is based on Graphite and which is meant to replace the current handling based on the "parloops" pass. This extends the class of OpenACC codes using "kernels" regions that can be analysed by GCC's OpenACC implementation considerably. We would like to incorporate this work into master soon, but further work will be necessary in the next weeks to resolve some open questions, clean up the code etc. In particular, the patches cannot be applied on master currently because they rely on other patches which have not been committed to master yet, e.g. the re-ordering of the OpenACC passes to run device lowering after Graphite which has recently been submitted (subject "Move pass_oacc_device_lower after pass_graphite"), the transformation pass which converts OpenACC kernels regions to parallel regions from OG10 (commit 809ea59722263eb6c2d48402e1eed80727134038). Best regards, Frederik Frederik Harwath (2): [WIP] OpenACC: Add Graphite-based handling of "auto" loops OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels gcc/c-family/c.opt| 5 +- gcc/common.opt| 8 + gcc/doc/invoke.texi | 10 +- gcc/doc/passes.texi | 6 +- gcc/flag-types.h | 1 + gcc/gimple-pretty-print.c | 3 + gcc/gimple.h | 9 +- gcc/gimplify.c| 1 + gcc/graphite-dependences.c| 12 +- gcc/graphite-isl-ast-to-gimple.c | 77 +- gcc/graphite-oacc.h | 90 ++ gcc/graphite-scop-detection.c | 828 ++ gcc/graphite-sese-to-poly.c | 26 +- gcc/graphite.c| 403 - gcc/graphite.h| 11 +- gcc/internal-fn.h | 7 +- gcc/omp-expand.c | 89 +- gcc/omp-general.c | 19 +- gcc/omp-general.h | 1 + gcc/omp-low.c | 76 +- gcc/omp-oacc-kernels.c| 59 +- gcc/omp-offload.c | 223 - gcc/predict.c | 2 +- .../goacc/kernels-conversion-parloops.c | 61 ++ .../c-c++-common/goacc/kernels-conversion.c | 12 +- .../graphite/alias-0-no-runtime-check.c | 20 + .../gcc.dg/graphite/alias-0-runtime-check.c | 21 + gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 + .../gfortran.dg/goacc/kernels-reductions.f90 | 37 + gcc/tree-chrec-oacc.h | 45 + gcc/tree-chrec.c | 16 +- gcc/tree-data-ref.c | 112 ++- gcc/tree-data-ref.h | 8 +- gcc/tree-loop-distribution.c | 17 +- gcc/tree-parloops.c | 16 +- gcc/tree-scalar-evolution.c | 257 +- gcc/tree-ssa-loop-ivcanon.c | 9 +- gcc/tree-ssa-loop-niter.c | 13 + gcc/tree-ssa-loop.c | 10 + 39 files changed, 2265 insertions(+), 377 deletions(-) create mode 100644 gcc/graphite-oacc.h create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 create mode 100644 gcc/tree-chrec-oacc.h -- 2.17.1 - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
[PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops
This patch enables the use of Graphite for the analysis of OpenACC "auto" loops. The goal is to decide if a loop may be parallelized (i.e. converted to an "independent" loop) or not. Graphite and the functionality on which it relies (scalar evolution, data references) are extended to interpret the internal representation of OpenACC loop constructs that is encoded (e.g. through calls to OpenACC-specific internal functions) in the OpenACC outlined functions (".omp_fn") and to ignore some artifacts of the outlining process that are not relevant for the analysis the original loops (e.g. pointers introduced for the purpose of offloading are irrelevant to the question whether the original loops can be parallelized or not). This is done in a way that does not impact code which does not use OpenACC. Furthermore, Graphite is extended by functionality that extends its applicability to real-world code (e.g. runtime alias checking). The OpenACC lowering is extended to use the result of Graphite's analysis to assign "independent" clauses to loops. --- gcc/common.opt| 8 + gcc/graphite-dependences.c| 12 +- gcc/graphite-isl-ast-to-gimple.c | 77 +- gcc/graphite-oacc.h | 90 ++ gcc/graphite-scop-detection.c | 828 ++ gcc/graphite-sese-to-poly.c | 26 +- gcc/graphite.c| 403 - gcc/graphite.h| 11 +- gcc/internal-fn.h | 7 +- gcc/omp-expand.c | 26 +- gcc/omp-offload.c | 173 +++- gcc/predict.c | 2 +- .../graphite/alias-0-no-runtime-check.c | 20 + .../gcc.dg/graphite/alias-0-runtime-check.c | 21 + gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 + gcc/tree-chrec-oacc.h | 45 + gcc/tree-chrec.c | 16 +- gcc/tree-data-ref.c | 112 ++- gcc/tree-data-ref.h | 8 +- gcc/tree-loop-distribution.c | 17 +- gcc/tree-scalar-evolution.c | 257 +- gcc/tree-ssa-loop-ivcanon.c | 9 +- gcc/tree-ssa-loop-niter.c | 13 + 23 files changed, 1870 insertions(+), 333 deletions(-) create mode 100644 gcc/graphite-oacc.h create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c create mode 100644 gcc/tree-chrec-oacc.h diff --git a/gcc/common.opt b/gcc/common.opt index dfed6ec76ba..caaeaa1aa6f 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1600,6 +1600,14 @@ fgraphite-identity Common Report Var(flag_graphite_identity) Optimization Enable Graphite Identity transformation. +fgraphite-non-affine-accesses +Common Report Var(flag_graphite_non_affine_accesses) Init(0) +Allow Graphite to handle non-affine data accesses. + +fgraphite-runtime-alias-checks +Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1) +Allow Graphite to add runtime alias checks to loops if aliasing cannot be resolved statically. + fhoist-adjacent-loads Common Report Var(flag_hoist_adjacent_loads) Optimization Enable hoisting adjacent loads to encourage generating conditional move diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c index 7078c949800..76ba027cdf3 100644 --- a/gcc/graphite-dependences.c +++ b/gcc/graphite-dependences.c @@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads, { if (dump_file) { - fprintf (dump_file, "Adding read to depedence graph: "); + fprintf (dump_file, "Adding read to dependence graph: "); print_pdr (dump_file, pdr); } isl_union_map *um @@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads, reads = isl_union_map_union (reads, um); if (dump_file) { - fprintf (dump_file, "Reads depedence graph: "); + fprintf (dump_file, "Reads dependence graph: "); print_isl_union_map (dump_file, reads); } } @@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads, { if (dump_file) { - fprintf (dump_file, "Adding must write to depedence graph: "); + fprintf (dump_file, "Adding must write to dependence graph: "); print_pdr (dump_file, pdr); } isl_union_map *um @@ -106,7 +106,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map *&reads, must_writes = isl_union_map_union (must_writes, um); if (dump_file)
[PATCH 2/2] OpenACC: Add Graphite-based "kernels" handling to pass_convert_oacc_kernels
This patch changes the "kernels" conversion to route loops in OpenACC "kernels" regions through Graphite. This is done by converting the loops in "kernels" regions which are not yet known to be "independent" to "auto" loops as in the current (OG10) "parloops" based "kernels" handling. Afterwards, the "kernels" regions will now be treated essentially like "parallel" regions. A new internal target kind however still enables to distinguish between the types of regions which is useful for diagnostic messages. The old "parloops" based "kernels" handling will be deprecated, but is still available through the command line options "-fopenacc-kernels=split-parloops" and "-fopenacc-kernels=parloops". --- gcc/c-family/c.opt| 5 +- gcc/doc/invoke.texi | 10 ++- gcc/doc/passes.texi | 6 +- gcc/flag-types.h | 1 + gcc/gimple-pretty-print.c | 3 + gcc/gimple.h | 9 ++- gcc/gimplify.c| 1 + gcc/omp-expand.c | 63 +-- gcc/omp-general.c | 19 - gcc/omp-general.h | 1 + gcc/omp-low.c | 76 +++ gcc/omp-oacc-kernels.c| 59 -- gcc/omp-offload.c | 50 +++- .../goacc/kernels-conversion-parloops.c | 61 +++ .../c-c++-common/goacc/kernels-conversion.c | 12 +-- .../gfortran.dg/goacc/kernels-reductions.f90 | 37 + gcc/tree-parloops.c | 16 +++- gcc/tree-ssa-loop.c | 10 +++ 18 files changed, 395 insertions(+), 44 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-conversion-parloops.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 4ef7ea76aa1..255ff84ca4b 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -1747,7 +1747,7 @@ Specify default OpenACC compute dimensions. fopenacc-kernels= C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT) --fopenacc-kernels=[split|parloops] Configure OpenACC 'kernels' constructs handling. +-fopenacc-kernels=[split|split-parloops|parloops] Configure OpenACC 'kernels' constructs handling. Enum Name(openacc_kernels) Type(enum openacc_kernels) @@ -1755,6 +1755,9 @@ Name(openacc_kernels) Type(enum openacc_kernels) EnumValue Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT) +EnumValue +Enum(openacc_kernels) String(split-parloops) Value(OPENACC_KERNELS_SPLIT_PARLOOPS) + EnumValue Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index fe04b4d8e6a..d713d6ae8ab 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -2266,12 +2266,20 @@ permitted. @opindex fopenacc-kernels @cindex OpenACC accelerator programming Configure OpenACC 'kernels' constructs handling. + With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs are split into a sequence of compute constructs, each then handled -individually. +individually. The data dependence analysis that is necessary to +determine if loops can be parallelized is performed by the Graphite +pass. This is the default. +With @option{-fopenacc-kernels=split-parloops}, OpenACC 'kernels' constructs +are split into a sequence of compute constructs, each then handled +individually. +This is deprecated. With @option{-fopenacc-kernels=parloops}, the whole OpenACC 'kernels' constructs is handled by the @samp{parloops} pass. +This is deprecated. @item -fopenmp @opindex fopenmp diff --git a/gcc/doc/passes.texi b/gcc/doc/passes.texi index 7424690dac3..5dda056a2bb 100644 --- a/gcc/doc/passes.texi +++ b/gcc/doc/passes.texi @@ -248,9 +248,9 @@ constraints in order to generate the points-to sets. It is located in This is a pass group for processing OpenACC kernels regions. It is a subpass of the IPA OpenACC pass group that runs on offloaded functions -containing OpenACC kernels loops. It is located in -@file{tree-ssa-loop.c} and is described by -@code{pass_ipa_oacc_kernels}. +containing OpenACC kernels loops if @samp{parloops} based handling of +kernels regions is used. It is located in @file{tree-ssa-loop.c} and +is described by @code{pass_ipa_oacc_kernels}. @item Target clone diff --git a/gcc/flag-types.h b/gcc/flag-types.h index e2255a56745..058c4e214af 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -376,6 +376,7 @@ enum cf_protection_level enum openacc_kernels { OPENACC_KERNELS_SPLIT, + OPENACC_KERNELS_SPLIT_PARLOOPS, OPENACC_KERNELS_PARLOOPS }; diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 54a6d318dc5..b4a2
Re: [PATCH 1/2] [WIP] OpenACC: Add Graphite-base handling of "auto" loops
Hi Richard, Richard Biener writes: > On Thu, Nov 12, 2020 at 11:11 AM Frederik Harwath > wrote: >> >> This patch enables the use of Graphite for the analysis of OpenACC >> "auto" loops. [...] >> Furthermore, Graphite is extended by functionality that extends >> its applicability to real-world code (e.g. runtime alias checking). > > I wonder if this can be split into a refactoring of graphite and adding > runtime alias capability and a part doing the OpenACC pieces. > Yes, I did not remove the runtime alias checking from this WIP-patch, but I planned to submit it separately. I am going to do this soon. Frederik > Richard. > >> --- >> gcc/common.opt| 8 + >> gcc/graphite-dependences.c| 12 +- >> gcc/graphite-isl-ast-to-gimple.c | 77 +- >> gcc/graphite-oacc.h | 90 ++ >> gcc/graphite-scop-detection.c | 828 ++ >> gcc/graphite-sese-to-poly.c | 26 +- >> gcc/graphite.c| 403 - >> gcc/graphite.h| 11 +- >> gcc/internal-fn.h | 7 +- >> gcc/omp-expand.c | 26 +- >> gcc/omp-offload.c | 173 +++- >> gcc/predict.c | 2 +- >> .../graphite/alias-0-no-runtime-check.c | 20 + >> .../gcc.dg/graphite/alias-0-runtime-check.c | 21 + >> gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 + >> gcc/tree-chrec-oacc.h | 45 + >> gcc/tree-chrec.c | 16 +- >> gcc/tree-data-ref.c | 112 ++- >> gcc/tree-data-ref.h | 8 +- >> gcc/tree-loop-distribution.c | 17 +- >> gcc/tree-scalar-evolution.c | 257 +- >> gcc/tree-ssa-loop-ivcanon.c | 9 +- >> gcc/tree-ssa-loop-niter.c | 13 + >> 23 files changed, 1870 insertions(+), 333 deletions(-) >> create mode 100644 gcc/graphite-oacc.h >> create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-no-runtime-check.c >> create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-0-runtime-check.c >> create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c >> create mode 100644 gcc/tree-chrec-oacc.h >> >> diff --git a/gcc/common.opt b/gcc/common.opt >> index dfed6ec76ba..caaeaa1aa6f 100644 >> --- a/gcc/common.opt >> +++ b/gcc/common.opt >> @@ -1600,6 +1600,14 @@ fgraphite-identity >> Common Report Var(flag_graphite_identity) Optimization >> Enable Graphite Identity transformation. >> >> +fgraphite-non-affine-accesses >> +Common Report Var(flag_graphite_non_affine_accesses) Init(0) >> +Allow Graphite to handle non-affine data accesses. >> + >> +fgraphite-runtime-alias-checks >> +Common Report Var(flag_graphite_runtime_alias_checks) Optimization Init(1) >> +Allow Graphite to add runtime alias checks to loops if aliasing cannot be >> resolved statically. >> + >> fhoist-adjacent-loads >> Common Report Var(flag_hoist_adjacent_loads) Optimization >> Enable hoisting adjacent loads to encourage generating conditional move >> diff --git a/gcc/graphite-dependences.c b/gcc/graphite-dependences.c >> index 7078c949800..76ba027cdf3 100644 >> --- a/gcc/graphite-dependences.c >> +++ b/gcc/graphite-dependences.c >> @@ -82,7 +82,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map >> *&reads, >> { >> if (dump_file) >> { >> - fprintf (dump_file, "Adding read to depedence graph: "); >> + fprintf (dump_file, "Adding read to dependence graph: "); >> print_pdr (dump_file, pdr); >> } >> isl_union_map *um >> @@ -90,7 +90,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map >> *&reads, >> reads = isl_union_map_union (reads, um); >> if (dump_file) >> { >> - fprintf (dump_file, "Reads depedence graph: "); >> + fprintf (dump_file, "Reads dependence graph: "); >> print_isl_union_map (dump_file, reads); >> } >> } >> @@ -98,7 +98,7 @@ scop_get_reads_and_writes (scop_p scop, isl_union_map >> *&reads, >> { >> if (dump_file) >> { >>
[PATCH][amdgcn] Add runtime ISA check for amdgcn offloading
Hi, this patch implements a runtime ISA check for amdgcn offloading. The check verifies that the ISA of the GPU to which we try to offload matches the ISA for which the code to be offloaded has been compiled. If it detects a mismatch, it emits an error message which contains a hint at the correct compilation parameters for the GPU. For instance: "libgomp: GCN fatal error: GCN code object ISA 'gfx906' does not match GPU ISA 'gfx900'. Try to recompile with '-foffload=-march=gfx900'." or "libgomp: GCN fatal error: GCN code object ISA 'gfx900' does not match agent ISA 'gfx803'. Try to recompile with '-foffload=-march=fiji'." (By the way, the names that we use for the ISAs are a bit inconsistent. Perhaps we should just use the gfx-names for all ISAs everywhere?.) Without this patch, the user only gets an confusing error message from the HSA runtime which fails to load the GCN object code. I have checked that the code does not lead to any regressions when running the test suite correctly, i.e. with the "-foffload=-march=..." option given to the compiler matching the architecture of the GPU. It seems difficult to implement an automated test that triggers an ISA mismatch. I have tested manually (for different combinations of the compilation flags and offloading GPU ISAs) that the runtime ISA check produces the expected error messages. Is it ok to commit this patch to the master branch? Frederik From 27981f9c93d1efed6d943dae4ea0c52147c02d5b Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 20 Jan 2020 07:45:43 +0100 Subject: [PATCH] Add runtime ISA check for amdgcn offloading When executing code that uses amdgcn GPU offloading, the ISA of the GPU must match the ISA for which the code has been compiled. So far, the libgomp amdgcn plugin did not attempt to verify this. In case of a mismatch, the user is confronted with an unhelpful error message produced by the HSA runtime. This commit implements a runtime ISA check. In the case of a ISA mismatch, the execution is aborted with a clear error message and a hint at the correct compilation parameters for the GPU on which the execution has been attempted. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum. (EF_AMDGPU_MACH_MASK): New constant. (gcn_isa): New typedef. (gcn_gfx801_s): New constant. (gcn_gfx803_s): New constant. (gcn_gfx900_s): New constant. (gcn_gfx906_s): New constant. (gcn_isa_name_len): New constant. (elf_gcn_isa_field): New function. (isa_hsa_name): New function. (isa_gcc_name): New function. (isa_code): New function. (struct agent_info): Add field "device_isa" ... (GOMP_OFFLOAD_init_device): ... and init from here, failing if device has unknown ISA; adapt init of "gfx900_p" to use new constants. (isa_matches_agent): New function ... (create_and_finalize_hsa_program): ... used from here to check that the GPU ISA and the code-object ISA match. --- libgomp/plugin/plugin-gcn.c | 127 +++- 1 file changed, 126 insertions(+), 1 deletion(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 16ce251f3a5..14f4a707a7c 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -396,6 +396,88 @@ struct gcn_image_desc struct global_var_info *global_variables; }; +/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we + support. + See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */ + +typedef enum { + EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028, + EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a, + EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c, + EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f, +} EF_AMDGPU_MACH; + +const static int EF_AMDGPU_MACH_MASK = 0x00ff; +typedef EF_AMDGPU_MACH gcn_isa; + +const static char* gcn_gfx801_s = "gfx801"; +const static char* gcn_gfx803_s = "gfx803"; +const static char* gcn_gfx900_s = "gfx900"; +const static char* gcn_gfx906_s = "gfx906"; +const static int gcn_isa_name_len = 6; + +static int +elf_gcn_isa_field (Elf64_Ehdr *image) +{ + return image->e_flags & EF_AMDGPU_MACH_MASK; +} + +/* Returns the name that the HSA runtime uses for the ISA or NULL if we do not + support the ISA. */ + +static const char* +isa_hsa_name (int isa) { + switch(isa) +{ +case EF_AMDGPU_MACH_AMDGCN_GFX801: + return gcn_gfx801_s; +case EF_AMDGPU_MACH_AMDGCN_GFX803: + return gcn_gfx803_s; +case EF_AMDGPU_MACH_AMDGCN_GFX900: + return gcn_gfx900_s; +case EF_AMDGPU_MACH_AMDGCN_GFX906: + return gcn_gfx906_s; +} + return NULL; +} + +/* Returns the user-facing name that GCC uses to identify the architecture (e.g. + with -march) or NULL if we do not support the ISA. + Keep in sync with /gcc/config/gcn/gcn.{c,opt}. */ + +static const char* +isa_gcc_name (int isa) { + switch(isa)
Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi Thomas, I have attached a patch containing the changes that you suggested. On 16.01.20 17:00, Thomas Schwinge wrote: > On 2019-12-20T17:46:57+0100, "Harwath, Frederik" > wrote: >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c > > I suggest to rename this one to 'acc_get_property-nvptx.c'> [...] >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c > I suggest to rename this one to 'acc_get_property-host.c'. I renamed both. > This assumes that the 'cuda*' interfaces and OpenACC/libgomp interfaces > handle/order device numbers in the same way -- which it seems they do, > but just noting this in case this becomes an issue at some point. Correct, I have added a corresponding comment to acc_get_property-nvptx.c. > Aside from improper data types being used for storing/printing the memory > information, we have to expect 'acc_property_free_memory' to change > between two invocations. ;-) Right! I have removed the assertion and changed it into ... > > Better to just verify that 'free_mem >= 0' (by means of 'size_t' data > type, I suppose), and 'free_mem <= total_mem'? ... this. > > (..., and for avoidance of doubt: I think there's no point in > special-casing this one for 'acc_device_host' where we know that > 'free_mem' is always zero -- this may change in the future.) Sure! But with the new "free_mem <= total_mem" assertion and since we assert total_mem == 0 and since free_mem >= 0, we effectively also assert that in the test right now ;-). Ok to push the commit to master? Best regards, Frederik From ef5a959bedc3214e86d6a683a02b693d82847ecd Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 20 Jan 2020 14:07:03 +0100 Subject: [PATCH] Fix expectation and types in acc_get_property tests * Weaken expectation concerning acc_property_free_memory. Do not expect the value returned by CUDA since that value might have changed in the meantime. * Use correct type for the results of calls to acc_get_property in tests. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c (expect_device_properties): Remove "expected_free_mem" argument, change "expected_total_mem" argument type to size_t; change types of acc_get_property results to size_t. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and rename to ... * testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and rename to ... * testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this. Reviewed-by: Thomas Schwinge --- .../acc_get_property-aux.c| 28 +-- ...t_property-3.c => acc_get_property-host.c} | 7 ++--- ..._property-2.c => acc_get_property-nvptx.c} | 9 +++--- 3 files changed, 22 insertions(+), 22 deletions(-) rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%) rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c index 952bdbf6aea..76c29501839 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c @@ -8,9 +8,8 @@ void expect_device_properties (acc_device_t dev_type, int dev_num, - int expected_total_mem, int expected_free_mem, - const char* expected_vendor, const char* expected_name, - const char* expected_driver) + size_t expected_memory, const char* expected_vendor, + const char* expected_name, const char* expected_driver) { const char *vendor = acc_get_property_string (dev_num, dev_type, acc_property_vendor); @@ -21,22 +20,23 @@ void expect_device_properties abort (); } - int total_mem = acc_get_property (dev_num, dev_type, -acc_property_memory); - if (total_mem != expected_total_mem) + size_t total_mem = acc_get_property (dev_num, dev_type, + acc_property_memory); + if (total_mem != expected_memory) { - fprintf (stderr, "Expected acc_property_memory to equal %d, " - "but was %d.\n", expected_total_mem, total_mem); + fprintf (stderr, "Expected acc_property_memory to equal %zd, " + "but was %zd.\n", expected_memory, total_mem); abort (); } - int free_mem = acc_get_property (dev_num, dev_type, + size_t free_mem = acc_get_property (dev_num, dev_type, acc_property_free_memory); - if (free_mem != expected_free_mem) + if (free_mem > total_me
Re: [PATCH][amdgcn] Add runtime ISA check for amdgcn offloading
Hi Andrew, Thanks for the review! I have attached a revised patch containing the changes that you suggested. On 20.01.20 11:00, Andrew Stubbs wrote: > On 20/01/2020 06:57, Harwath, Frederik wrote: >> Is it ok to commit this patch to the master branch? > > I can't see anything significantly wrong with the code of the patch, however > I have some minor issues I'd like fixed in the text. > > [...] Please move the functions down into the "Utility functions" group. The > const static variables should probably go with them. Done. >> @@ -3294,7 +3415,11 @@ GOMP_OFFLOAD_init_device (int n) >> &buf); >> if (status != HSA_STATUS_SUCCESS) >> return hsa_error ("Error querying the name of the agent", status); >> - agent->gfx900_p = (strncmp (buf, "gfx900", 6) == 0); >> + agent->gfx900_p = (strncmp (buf, gcn_gfx900_s, gcn_isa_name_len) == 0); >> + >> + agent->device_isa = isa_code (buf); >> + if (agent->device_isa < 0) >> + return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR); > > Can device_isa not just replace gfx900_p? I think it's only tested in one > place, and that would be easily substituted. > Yes, I have changed that one place to use agent->device_isa. I would commit the patch then if nobody objects :-). The other approaches (fat binaries etc.) that have been discussed in this thread seem to be long-term projects and until something like this gets implemented the early error checking implemented by this patch seems to be better than nothing. Frederik From 470892454bf0d67ea71c2399f5819713592e46a0 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 20 Jan 2020 07:45:43 +0100 Subject: [PATCH] Add runtime ISA check for amdgcn offloading When executing code that uses amdgcn GPU offloading, the ISA of the GPU must match the ISA for which the code has been compiled. So far, the libgomp amdgcn plugin did not attempt to verify this. In case of a mismatch, the user is confronted with an unhelpful error message produced by the HSA runtime. This commit implements a runtime ISA check. In the case of a ISA mismatch, the execution is aborted with a clear error message and a hint at the correct compilation parameters for the GPU on which the execution has been attempted. libgomp/ * plugin/plugin-gcn.c (EF_AMDGPU_MACH): New enum. * (EF_AMDGPU_MACH_MASK): New constant. * (gcn_isa): New typedef. * (gcn_gfx801_s): New constant. * (gcn_gfx803_s): New constant. * (gcn_gfx900_s): New constant. * (gcn_gfx906_s): New constant. * (gcn_isa_name_len): New constant. * (elf_gcn_isa_field): New function. * (isa_hsa_name): New function. * (isa_gcc_name): New function. * (isa_code): New function. * (struct agent_info): Add field "device_isa" and remove field "gfx900_p". * (GOMP_OFFLOAD_init_device): Adapt agent init to "agent_info" field changes, fail if device has unknown ISA. * (parse_target_attributes): Replace "gfx900_p" by "device_isa". * (isa_matches_agent): New function ... * (create_and_finalize_hsa_program): ... used from here to check that the GPU ISA and the code-object ISA match. --- libgomp/plugin/plugin-gcn.c | 131 ++-- 1 file changed, 127 insertions(+), 4 deletions(-) diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 16ce251f3a5..de470a3dd33 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -396,6 +396,20 @@ struct gcn_image_desc struct global_var_info *global_variables; }; +/* This enum mirrors the corresponding LLVM enum's values for all ISAs that we + support. + See https://llvm.org/docs/AMDGPUUsage.html#amdgpu-ef-amdgpu-mach-table */ + +typedef enum { + EF_AMDGPU_MACH_AMDGCN_GFX801 = 0x028, + EF_AMDGPU_MACH_AMDGCN_GFX803 = 0x02a, + EF_AMDGPU_MACH_AMDGCN_GFX900 = 0x02c, + EF_AMDGPU_MACH_AMDGCN_GFX906 = 0x02f, +} EF_AMDGPU_MACH; + +const static int EF_AMDGPU_MACH_MASK = 0x00ff; +typedef EF_AMDGPU_MACH gcn_isa; + /* Description of an HSA GPU agent (device) and the program associated with it. */ @@ -408,8 +422,9 @@ struct agent_info /* Whether the agent has been initialized. The fields below are usable only if it has been. */ bool initialized; - /* Precomputed check for problem architectures. */ - bool gfx900_p; + + /* The instruction set architecture of the device. */ + gcn_isa device_isa; /* Command queues of the agent. */ hsa_queue_t *sync_queue; @@ -1232,7 +1247,8 @@ parse_target_attributes (void **input, if (gcn_dims_found) { - if (agent->gfx900_p && gcn_threads == 0 && override_z_dim == 0) + if (agent->device_isa == EF_AMDGPU_MACH_AMDGCN_GFX900 + && gcn_threads == 0 && override_z_dim == 0)
Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi Thomas, On 23.01.20 15:32, Thomas Schwinge wrote: > On 2020-01-20T15:01:01+0100, "Harwath, Frederik" > wrote: >> On 16.01.20 17:00, Thomas Schwinge wrote: >>> On 2019-12-20T17:46:57+0100, "Harwath, Frederik" >>> wrote: >> Ok to push the commit to master? > > Thanks, OK. Reviewed-by: Thomas Schwinge Thank you. Committed as 4bd03ed69bd789278a0286017b692f49052ffe5c, including the changes to the size_t formatting. Best regards, Frederik > > > As a low-priority follow-up, please look into: > > > source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c: > In function 'expect_device_properties': > > source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:74:24: > warning: format '%d' expects argument of type 'int', but argument 3 has type > 'const char *' [-Wformat=] >74 | fprintf (stderr, "Expected value of unknown string property > to be NULL, " > | > ^~~~ >75 | "but was %d.\n", s); > | ~ > | | > | const char * > > source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c:75:19: > note: format string is defined here >75 | "but was %d.\n", s); > | ~^ > | | > | int > | %s > > ..., and (random example): > >>int unknown_property = 16058; >> - int v = acc_get_property (dev_num, dev_type, >> (acc_device_property_t)unknown_property); >> + size_t v = acc_get_property (dev_num, dev_type, >> (acc_device_property_t)unknown_property); >>if (v != 0) >> { >>fprintf (stderr, "Expected value of unknown numeric property to equal >> 0, " >> - "but was %d.\n", v); >> + "but was %zd.\n", v); >>abort (); >> } > > ..., shouldn't that be '%zu' given that 'size_t' is 'unsigned'? > > libgomp.oacc-c-c++-common/acc_get_property-aux.c: fprintf (stderr, > "Expected acc_property_memory to equal %zd, " > libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was > %zd.\n", expected_memory, total_mem); > libgomp.oacc-c-c++-common/acc_get_property-aux.c:", but free > memory was %zd and total memory was %zd.\n", > libgomp.oacc-c-c++-common/acc_get_property-aux.c:"but was > %zd.\n", v); > libgomp.oacc-c-c++-common/acc_get_property.c: printf ("Total > memory: %zd\n", v); > libgomp.oacc-c-c++-common/acc_get_property.c: printf ("Free > memory: %zd\n", v); > > > Grüße > Thomas > From 4bd03ed69bd789278a0286017b692f49052ffe5c Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 20 Jan 2020 14:07:03 +0100 Subject: [PATCH 1/2] Fix expectation and types in acc_get_property tests * Weaken expectation concerning acc_property_free_memory. Do not expect the value returned by CUDA since that value might have changed in the meantime. * Use correct type for the results of calls to acc_get_property in tests. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c (expect_device_properties): Remove "expected_free_mem" argument, change "expected_total_mem" argument type to size_t; change types of acc_get_property results to size_t, adapt format strings. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: Use %zu instead of %zd to print size_t values. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-2.c: Adapt and rename to ... * testsuite/libgomp.oacc-c-c++-common/acc_get_property-nvptx.c: ... this. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-3.c: Adapt and rename to ... * testsuite/libgomp.oacc-c-c++-common/acc_get_property-host.c: ... this. Reviewed-by: Thomas Schwinge --- .../acc_get_property-aux.c| 30 +-- ...t_property-3.c => acc_get_property-host.c} | 7 ++--- ..._property-2.c => acc_get_property-nvptx.c} | 9 +++--- .../acc_get_property.c| 4 +-- 4 files changed, 25 insertions(+), 25 deletions(-) rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-3.c => acc_get_property-host.c} (63%) rename libgomp/testsuite/libgomp.oacc-c-c++-common/{acc_get_property-2.c => acc_get_property-nvptx.c} (86%) diff --git a/libg
[PATCH] Add OpenACC acc_get_property support for AMD GCN
Hi, this patch adds full support for the OpenACC 2.6 acc_get_property and acc_get_property_string functions to the libgomp GCN plugin. This replaces the existing stub in libgomp/plugin-gcn.c. Andrew: The value returned for acc_property_memory ("size of device memory in bytes" according to the spec) is the HSA_REGION_INFO_SIZE of the agent's data_region. This has been adapted from a previous incomplete implementation that we had on the OG9 branch. Does that sound reasonable? I have tested the patch with amdgcn and nvptx offloading. Ok to commit this to the main branch? Best regards, Frederik From 6f1855281c38993a088f9b4af020a786f8e05fe9 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 28 Jan 2020 08:01:00 +0100 Subject: [PATCH] Add OpenACC acc_get_property support for AMD GCN Add full support for the OpenACC 2.6 acc_get_property and acc_get_property_string functions to the libgomp GCN plugin. libgomp/ * plugin-gcn.c (struct agent_info): Add fields "name" and "vendor_name" ... (GOMP_OFFLOAD_init_device): ... and init from here. (struct hsa_context_info): Add field "driver_version_s" ... (init_hsa_contest): ... and init from here. (GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper implementation. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: Enable test execution for amdgcn and host offloading targets. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c (expect_device_properties): Split function into ... (expect_device_string_properties): ... this new function ... (expect_device_memory): ... and this new function. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c: Add test. --- libgomp/plugin/plugin-gcn.c | 63 +++-- .../acc_get_property-aux.c| 60 +--- .../acc_get_property-gcn.c| 132 ++ .../acc_get_property.c| 5 +- .../libgomp.oacc-fortran/acc_get_property.f90 | 2 - 5 files changed, 224 insertions(+), 38 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 7854c142f05..0a09daaa0a4 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -425,7 +425,10 @@ struct agent_info /* The instruction set architecture of the device. */ gcn_isa device_isa; - + /* Name of the agent. */ + char name[64]; + /* Name of the vendor of the agent. */ + char vendor_name[64]; /* Command queues of the agent. */ hsa_queue_t *sync_queue; struct goacc_asyncqueue *async_queues, *omp_async_queue; @@ -544,6 +547,8 @@ struct hsa_context_info int agent_count; /* Array of agent_info structures describing the individual HSA agents. */ struct agent_info *agents; + /* Driver version string. */ + char driver_version_s[30]; }; /* Format of the on-device heap. @@ -1513,6 +1518,15 @@ init_hsa_context (void) GOMP_PLUGIN_error ("Failed to list all HSA runtime agents"); } + uint16_t minor, major; + status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, &minor); + if (status != HSA_STATUS_SUCCESS) +GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version"); + status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, &major); + if (status != HSA_STATUS_SUCCESS) +GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version"); + sprintf (hsa_context.driver_version_s, "HSA Runtime %d.%d", major, minor); + hsa_context.initialized = true; return true; } @@ -3410,15 +3424,19 @@ GOMP_OFFLOAD_init_device (int n) return hsa_error ("Error requesting maximum queue size of the GCN agent", status); - char buf[64]; status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME, - &buf); + &agent->name); if (status != HSA_STATUS_SUCCESS) return hsa_error ("Error querying the name of the agent", status); - agent->device_isa = isa_code (buf); + agent->device_isa = isa_code (agent->name); if (agent->device_isa < 0) -return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR); +return hsa_error ("Unknown GCN agent architecture", HSA_STATUS_ERROR); + + status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_VENDOR_NAME, + &agent->vendor_name); + if (status != HSA_STATUS_SUCCESS) +return hsa_error ("Error querying the vendor name of the agent", status); status = hsa_fns.hsa_queue_create_fn (agent->id, queue_size, HSA_QUEUE_TYPE_MULTI, @@ -4115,12 +4133,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, void *dst, const void *src, union goacc_property_value GOMP_OF
Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN
Hi Andrew, On 28.01.20 16:42, Andrew Stubbs wrote: > On 28/01/2020 14:55, Harwath, Frederik wrote: > > If we're going to use a fixed-size buffer then we should use snprintf and > emit GCN_WARNING if the return value is greater than > "sizeof(driver_version_s)", even though that is unlikely. Do the same in the > testcase, but use a bigger buffer so that truncation causes a mismatch and > test failure. Ok. > I realise that an existing function in this testcase uses this layout, but > the code style does not normally have the parameter list on the next line, > and certainly not in column 1. Ok. I have also adjusted the formatting in the other acc_get_property tests to the code style. I have turned this into a separate trivial patch. Ok to commit the revised patch? Best regards, Frederik From fb15cb9058feeda8891d6454d32f43fda885b789 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Wed, 29 Jan 2020 10:19:50 +0100 Subject: [PATCH 1/2] Add OpenACC acc_get_property support for AMD GCN Add full support for the OpenACC 2.6 acc_get_property and acc_get_property_string functions to the libgomp GCN plugin. libgomp/ * plugin-gcn.c (struct agent_info): Add fields "name" and "vendor_name" ... (GOMP_OFFLOAD_init_device): ... and init from here. (struct hsa_context_info): Add field "driver_version_s" ... (init_hsa_contest): ... and init from here. (GOMP_OFFLOAD_openacc_get_property): Replace stub with a proper implementation. * testsuite/libgomp.oacc-c-c++-common/acc_get_property.c: Enable test execution for amdgcn and host offloading targets. * testsuite/libgomp.oacc-fortran/acc_get_property.f90: Likewise. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c (expect_device_properties): Split function into ... (expect_device_string_properties): ... this new function ... (expect_device_memory): ... and this new function. * testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c: Add test. --- libgomp/plugin/plugin-gcn.c | 71 -- .../acc_get_property-aux.c| 79 ++- .../acc_get_property-gcn.c| 132 ++ .../acc_get_property.c| 5 +- .../libgomp.oacc-fortran/acc_get_property.f90 | 2 - 5 files changed, 242 insertions(+), 47 deletions(-) create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c index 7854c142f05..45c625495b9 100644 --- a/libgomp/plugin/plugin-gcn.c +++ b/libgomp/plugin/plugin-gcn.c @@ -425,7 +425,10 @@ struct agent_info /* The instruction set architecture of the device. */ gcn_isa device_isa; - + /* Name of the agent. */ + char name[64]; + /* Name of the vendor of the agent. */ + char vendor_name[64]; /* Command queues of the agent. */ hsa_queue_t *sync_queue; struct goacc_asyncqueue *async_queues, *omp_async_queue; @@ -544,6 +547,8 @@ struct hsa_context_info int agent_count; /* Array of agent_info structures describing the individual HSA agents. */ struct agent_info *agents; + /* Driver version string. */ + char driver_version_s[30]; }; /* Format of the on-device heap. @@ -1513,6 +1518,23 @@ init_hsa_context (void) GOMP_PLUGIN_error ("Failed to list all HSA runtime agents"); } + uint16_t minor, major; + status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MINOR, &minor); + if (status != HSA_STATUS_SUCCESS) +GOMP_PLUGIN_error ("Failed to obtain HSA runtime minor version"); + status = hsa_fns.hsa_system_get_info_fn (HSA_SYSTEM_INFO_VERSION_MAJOR, &major); + if (status != HSA_STATUS_SUCCESS) +GOMP_PLUGIN_error ("Failed to obtain HSA runtime major version"); + + size_t len = sizeof hsa_context.driver_version_s; + int printed = snprintf (hsa_context.driver_version_s, len, + "HSA Runtime %hu.%hu", (unsigned short int)major, + (unsigned short int)minor); + if (printed >= len) +GCN_WARNING ("HSA runtime version string was truncated." + "Version %hu.%hu is too long.", (unsigned short int)major, + (unsigned short int)minor); + hsa_context.initialized = true; return true; } @@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n) return hsa_error ("Error requesting maximum queue size of the GCN agent", status); - char buf[64]; status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME, - &buf); + &agent->name); if (status != HSA_STATUS_SUCCESS) return hsa_error ("Error querying the name of the agent", status); - agent->device_isa = isa_code (buf); + agent->device_isa = isa_code (agent->name); if (agent->device_isa < 0) -return hsa_error ("Unknown GCN agent architecture.", HSA_STATUS_ERROR)
Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN
Hi Andrew, On 29.01.20 11:38, Andrew Stubbs wrote: > On 29/01/2020 09:52, Harwath, Frederik wrote: > > Patch 1 is OK with the formatting fixed. > Patch 2 is OK. > > Thanks very much, > Committed as 2e5ea57959183bd5bd0356739bb5167417401a31 and 87c3fcfa6bbb5c372d4e275276d21f601d0b62b0. Thank you for the review, Frederik
[PATCH][OpenACC] Add acc_device_radeon to name_of_acc_device_t function
Hi, we should handle acc_device_radeon in the name_of_acc_device_t function which is used in libgomp/oacc-init.c to display the name of devices in several error messages. Ok to commit this patch to master? Best regards, Frederik From 6aacba3e8123ce5e0961857802fd7d8a103aa96b Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 27 Jan 2020 15:41:26 +0100 Subject: [PATCH] Add acc_device_radeon to name_of_acc_device_t function libgomp/ * oacc-init.c (name_of_acc_device_t): Handle acc_device_radeon. --- libgomp/oacc-init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index 89a30b3e716..ef12b4c16d0 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -115,6 +115,7 @@ name_of_acc_device_t (enum acc_device_t type) case acc_device_host: return "host"; case acc_device_not_host: return "not_host"; case acc_device_nvidia: return "nvidia"; +case acc_device_radeon: return "radeon"; default: unknown_device_type_error (type); } __builtin_unreachable (); -- 2.17.1
Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN
Hi Thomas, On 29.01.20 18:44, Thomas Schwinge wrote: >> + size_t len = sizeof hsa_context.driver_version_s; >> + int printed = snprintf (hsa_context.driver_version_s, len, >> + "HSA Runtime %hu.%hu", (unsigned short int)major, >> + (unsigned short int)minor); >> + if (printed >= len) >> +GCN_WARNING ("HSA runtime version string was truncated." >> + "Version %hu.%hu is too long.", (unsigned short int)major, >> + (unsigned short int)minor); > > (Can it actually happen that 'snprintf' returns 'printed > len' -- > meaning that it's written into random memory? I thought 'snprintf' has a > hard stop at 'len'? Or does this indicate the amount of memory it > would've written? I should re-read the manpage at some point...) ;-) > Yes, "printed > len" can happen. Seems that I have chosen a bad variable name. "actual_len" (of the formatted string that should have been written - excluding the terminating '\0') would have been more appropriate. > For 'printed = len' does or doesn't 'snprintf' store the terminating > 'NUL' character, or do we manually have to set: > > hsa_context.driver_version_s[len - 1] = '\0'; > > ... in that case? No, in this case, the printed string is missing the last character, but the terminating '\0' has been written. Consider: #include int main () { char s[] = "foo"; char buf[3]; // buf is too short to hold terminating '\0' int actual_len = snprintf (buf, 3, "%s", s); printf ("buf: %s\n", buf); printf ("actual_len: %d\n", actual_len); } Output: buf: fo actual_len: 3 > >> @@ -3410,15 +3432,19 @@ GOMP_OFFLOAD_init_device (int n) > >> - char buf[64]; >>status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AGENT_INFO_NAME, >> - &buf); >> + &agent->name); >>if (status != HSA_STATUS_SUCCESS) >> return hsa_error ("Error querying the name of the agent", status); > > (That's of course pre-existing, but) this looks like a dangerous API, > given that 'hsa_agent_get_info_fn' doesn't know 'sizeof agent->name' (or > 'sizeof buf' before)... The API documentation (cf. https://rocm-documentation.readthedocs.io/en/latest/ROCm_API_References/ROCr-API.html) states that "the type of this attribute is a NUL-terminated char[64]". But, right, should this ever change, we might not notice it. Best regards, Frederik
Re: [PATCH] Add OpenACC acc_get_property support for AMD GCN
Hi Thomas, On 30.01.20 17:08, Thomas Schwinge wrote: > I understand correctly that the only reason for: > > On 2020-01-29T10:52:57+0100, "Harwath, Frederik" > wrote: >> * testsuite/libgomp.oacc-c-c++-common/acc_get_property-aux.c >> (expect_device_properties): Split function into ... >> (expect_device_string_properties): ... this new function ... >> (expect_device_memory): ... and this new function. > > ... this split is that we can't test 'expect_device_memory' here: > [...] > ..., because that one doesn't (re-)implement the 'acc_property_memory' > interface? Correct. But why "re-"? It has not been implemented before. >> --- a/libgomp/plugin/plugin-gcn.c >> +++ b/libgomp/plugin/plugin-gcn.c > >> @@ -4115,12 +4141,37 @@ GOMP_OFFLOAD_openacc_async_dev2host (int device, >> void *dst, const void *src, >> union goacc_property_value >> GOMP_OFFLOAD_openacc_get_property (int device, enum goacc_property prop) >> { >> [...] >> + switch (prop) >> +{ >> +case GOACC_PROPERTY_FREE_MEMORY: >> + /* Not supported. */ >> + break; > > (OK, can be added later when somebody feels like doing that.) Well, "not supported" means that there seems to be no (reasonable) way to obtain the necessary information from the runtime - in contrast to the nvptx plugin where it can be obtained easily through the CUDA API. > >> +case GOACC_PROPERTY_MEMORY: >> + { >> +size_t size; >> +hsa_region_t region = agent->data_region; >> +hsa_status_t status = >> + hsa_fns.hsa_region_get_info_fn (region, HSA_REGION_INFO_SIZE, &size); >> +if (status == HSA_STATUS_SUCCESS) >> + propval.val = size; >> +break; >> + } >> [...] >> } > > Here we got 'acc_property_memory' implemented, but not here: > >> --- /dev/null >> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_get_property-gcn.c Yes, there seems to be no straightforward way to determine the expected value through the runtime API. We might of course try to replicate the logic that is used in plugin-gcn.c. Best regards, Frederik
Re: Make OpenACC 'acc_get_property' with 'acc_device_current' work (was: [PATCH] Add OpenACC 2.6 `acc_get_property' support)
Hi Thomas, On 30.01.20 16:54, Thomas Schwinge wrote: > > [...] the 'acc_device_current' interface should work already now. > > [...] Please review > the attached (Tobias the Fortran test cases, please), and test with AMD > GCN offloading. If approving this patch, please respond with I have tested the patch with AMD GCN offloading and I have observed no regressions. The new tests pass as expected and print the correct output. Great that you have extended the Fortran tests! > diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c > index ef12b4c16d01..c28c0f689ba2 100644 > --- a/libgomp/oacc-init.c > +++ b/libgomp/oacc-init.c > @@ -796,7 +796,9 @@ get_property_any (int ord, acc_device_t d, > acc_device_property_t prop) > size_t > acc_get_property (int ord, acc_device_t d, acc_device_property_t prop) > { > - if (!known_device_type_p (d)) > + if (d == acc_device_current) > +; /* Allowed only for 'acc_get_property', 'acc_get_property_string'. */ > + else if (!known_device_type_p (d)) > unknown_device_type_error(d); I don't like the empty if branch very much. Introducing a variable (for instance, "bool allowed_device_type = acc_device_current || known_device_type(d);") would also provide a place for your comment. You could also extract a function to avoid duplicating the explanation in acc_get_property_string. The patch looks good to me. Reviewed-by: Frederik Harwath Best regards, Frederik
[PATCH] xfail and improve some failing libgomp tests
Hi, the libgomp testsuite contains some test cases (all in /libgomp/testsuite/libgomp.c/) which fail with nvptx offloading because of some long standing issues: * {target-32.c, thread-limit-2.c}: no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690 * target-{33,34}.c: no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. https://gcc.gnu.org/PR81688 * target-link-1.c: omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689 All these issues have been known, at least, since 2016: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00972.html As suggested in this mail: "Short term, it should be possible to implement something like -foffload=^nvptx to skip PTX (and only PTX) offloading on those tests." Well, we can now skip/xfail tests for nvptx offloading using the effective target "offload_target_nvptx" and the present patch uses this to xfail the tests for which no short-term solution is in sight, i.e. the GOMP_OFFLOAD_async_run and the "target link" related failures. Regarding the "usleep" issue, I have decided to follow Jakub's suggestion (cf. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01026.html) to replace usleep by busy waiting. As noted by Tobias (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81690#c4), this involves creating separate test files for the cases with and without usleep. This solution is a bit cumbersome but I think we can live with it, in particular, since the actual test case implementations do not get duplicated (they have been moved into auxiliary header files which are shared by both variants of the corresponding tests). Since the "usleep" issue also concerns amdgcn, I have introduced an effective target "offload_target_amdgcn" to add xfails for this offloading target, too. This behaves like "offload_target_nvptx" but for amdgcn. Note that the existing amdgcn effective targets cannot be used for our purpose since they are OpenACC-specific. The new thread-limit-2-nosleep.c should now pass for both nvptx and amdgcn offloading whereas thread-limit-2.c should xfail. The new target-32-nosleep.c passes with amdgcn offloading, but xfails with nvptx offloading, because it also needs the unimplemented GOMP_OFFLOAD_async_run. With the patch, the detailed test summary now looks as follows for me: nvptx offloading: // Expected execution failures due to missing usleep PASS: libgomp.c/target-32-nosleep.c (test for excess errors) XFAIL: libgomp.c/target-32-nosleep.c execution test// missing GOMP_OFFLOAD_async_run XFAIL: libgomp.c/target-32.c (test for excess errors) UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors) PASS: libgomp.c/thread-limit-2-nosleep.c execution test XFAIL: libgomp.c/thread-limit-2.c (test for excess errors) UNRESOLVED: libgomp.c/thread-limit-2.c compilation failed to produce executable // Expected execution failures due to missing GOMP_OFFLOAD_async_run PASS: libgomp.c/target-33.c (test for excess errors) XFAIL: libgomp.c/target-33.c execution test PASS: libgomp.c/target-34.c (test for excess errors) XFAIL: libgomp.c/target-34.c execution test // Expected compilation failures due to missing target link XFAIL: libgomp.c/target-link-1.c (test for excess errors) UNRESOLVED: libgomp.c/target-link-1.c compilation failed to produce executable amdgcn offloading: // Tests using usleep PASS: libgomp.c/target-32-nosleep.c (test for excess errors) PASS: libgomp.c/target-32-nosleep.c execution test XFAIL: libgomp.c/target-32.c 7 blank line(s) in output XFAIL: libgomp.c/target-32.c (test for excess errors) UNRESOLVED: libgomp.c/target-32.c compilation failed to produce executable PASS: libgomp.c/thread-limit-2-nosleep.c (test for excess errors) PASS: libgomp.c/thread-limit-2-nosleep.c execution test XFAIL: libgomp.c/thread-limit-2.c 1 blank line(s) in output XFAIL: libgomp.c/thread-limit-2.c (test for excess errors) // No failures since GOMP_OFFLOAD_async_run works on amdgcn PASS: libgomp.c/target-33.c (test for excess errors) PASS: libgomp.c/target-33.c execution test PASS: libgomp.c/target-34.c (test for excess errors) PASS: libgomp.c/target-34.c execution test // No xfail here PASS: libgomp.c/target-link-1.c (test for excess errors) FAIL: libgomp.c/target-link-1.c execution test Note that target-link-1.c execution does also fail on amdgcn. Since - in contrast to nvptx - it seems that the cause of this failure has not yet been investigated and discussed, I have not added an xfail for amdgcn to this test. All testing has been done with a x86_64-linux-gnu host and target. Ok to commit this patch? Best regards, Frederik From 6e5e2d45f02235a0f72e6130dcd8d52f88f7b126 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Fri, 7 Feb 2020 08:03:00 +0100 Subject: [PATCH] xfail and improve some failing libgomp
Re: [PATCH] xfail and improve some failing libgomp tests
Hi Jakub, On 07.02.20 16:29, Jakub Jelinek wrote: > On Fri, Feb 07, 2020 at 09:56:38AM +0100, Harwath, Frederik wrote: >> * {target-32.c, thread-limit-2.c}: >> no "usleep" implemented for nvptx. Cf. https://gcc.gnu.org/PR81690 > > Please don't, I want to deal with that using declare variant, just didn't > get yet around to finishing the last patch needed for that. Will try next > week. Ok, great! looking forward to see a better solution. >> * target-{33,34}.c: >> no "GOMP_OFFLOAD_async_run" implemented in plugin-nvptx.c. Cf. >> https://gcc.gnu.org/PR81688 >> >> * target-link-1.c: >> omp "target link" not implemented for nvptx. Cf. https://gcc.gnu.org/PR81689 > > I guess this is ok, though of course the right thing would be to implement > both Ok, this means that I can commit the attached patch which contains only the changes to target-{33,43}.c and target-link-1.c? Of course, I agree that those features should be implemented. > There has been even in some PR a suggestion that instead of failing > in nvptx async_run we should just ignore the nowait clause if the plugin > doesn't implement it properly. This must be https://gcc.gnu.org/PR93481. Best regards, Frederik From e5165ccb143022614920dbd208f6f368b84b4382 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 10 Feb 2020 08:08:00 +0100 Subject: [PATCH] Add xfails to libgomp tests target-{33,34}.c, target-link-1.c Add xfails for nvptx offloading because "no GOMP_OFFLOAD_async_run implemented in plugin-nvptx.c" (https://gcc.gnu.org/PR81688) and because "omp target link not implemented for nvptx" (https://gcc.gnu.org/PR81689). libgomp/ * testsuite/libgomp.c/target-33.c: Add xfail for execution on offload_target_nvptx, cf. https://gcc.gnu.org/PR81688. * testsuite/libgomp.c/target-34.c: Likewise. * testsuite/libgomp.c/target-link-1.c: Add xfail for offload_target_nvptx, cf. https://gcc.gnu.org/PR81689. --- libgomp/testsuite/libgomp.c/target-33.c | 3 +++ libgomp/testsuite/libgomp.c/target-34.c | 3 +++ libgomp/testsuite/libgomp.c/target-link-1.c | 3 +++ 3 files changed, 9 insertions(+) diff --git a/libgomp/testsuite/libgomp.c/target-33.c b/libgomp/testsuite/libgomp.c/target-33.c index 1bed4b6bc67..15d2d7e38ab 100644 --- a/libgomp/testsuite/libgomp.c/target-33.c +++ b/libgomp/testsuite/libgomp.c/target-33.c @@ -1,3 +1,6 @@ +/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } } + Cf. https://gcc.gnu.org/PR81688. */ + extern void abort (void); int diff --git a/libgomp/testsuite/libgomp.c/target-34.c b/libgomp/testsuite/libgomp.c/target-34.c index 66d9f54202b..5a3596424d8 100644 --- a/libgomp/testsuite/libgomp.c/target-34.c +++ b/libgomp/testsuite/libgomp.c/target-34.c @@ -1,3 +1,6 @@ +/* { dg-xfail-run-if "GOMP_OFFLOAD_async_run not implemented" { offload_target_nvptx } } + Cf. https://gcc.gnu.org/PR81688. */ + extern void abort (void); int diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c index 681677cc2aa..99ce33bc9b4 100644 --- a/libgomp/testsuite/libgomp.c/target-link-1.c +++ b/libgomp/testsuite/libgomp.c/target-link-1.c @@ -1,3 +1,6 @@ +/* { dg-xfail-if "#pragma omp target link not implemented" { offload_target_nvptx } } + Cf. https://gcc.gnu.org/PR81689. */ + struct S { int s, t; }; int a = 1, b = 1; -- 2.17.1
[PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]
Hi Jakub, On 10.02.20 08:49, Harwath, Frederik wrote: >> There has been even in some PR a suggestion that instead of failing >> in nvptx async_run we should just ignore the nowait clause if the plugin >> doesn't implement it properly. > > This must be https://gcc.gnu.org/PR93481. The attached patch implements the behavior that has been suggested in the PR. It makes GOMP_OFFLOAD_async_run optional, removes the stub which produces the error described in the PR from the nvptx plugin, and changes the nowait-handling to ignore the clause if GOMP_OFFLOAD_async_run is not available for the executing device's plugin. I have tested the patch by running the full libgomp testsuite with nvptx-none offloading on x86_64-linux-gnu. I have observed no regressions. Ok to push the commit to master? For the record: Someone should implement GOMP_OFFLOAD_async_run properly in the nvtpx plugin. Best regards, Frederik From 1258f713be317870e9171281e3f7c3a174773aa1 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Thu, 13 Feb 2020 07:30:16 +0100 Subject: [PATCH] openmp: ignore nowait if async execution is unsupported [PR93481] An OpenMP "nowait" clause on a target construct currently leads to a call to GOMP_OFFLOAD_async_run in the plugin that is used for offloading at execution time. The nvptx plugin contains only a stub of this function that always produces a fatal error if called. This commit changes the "nowait" implementation to ignore the clause if the executing device's plugin does not implement GOMP_OFFLOAD_async_run. The stub in the nvptx plugin is removed which effectively means that programs containing "nowait" can now be executed with nvptx offloading as if the clause had not been used. This behavior is consistent with the OpenMP specification which says that "[...] execution of the target task *may* be deferred" (emphasis added), cf. OpenMP 5.0, page 172. libgomp/ * plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub. * target.c (gomp_load_plugin_for_device): Make "async_run" loading optional. (gomp_target_task_fn): Assert "devicep->async_run_func". (clear_unsupported_flags): New function to remove unsupported flags (right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored. (GOMP_target_ext): Apply clear_unsupported_flags to flags. (GOMP_target_update_ext): Likewise. (GOMP_target_enter_exit_data): Likewise. * testsuite/libgomp.c/target-33.c: Remove xfail for offload_target_nvptx. * testsuite/libgomp.c/target-34.c: Likewise. --- libgomp/plugin/plugin-nvptx.c | 7 +-- libgomp/target.c| 19 ++- libgomp/testsuite/libgomp.c/target-33.c | 3 --- libgomp/testsuite/libgomp.c/target-34.c | 3 --- 4 files changed, 19 insertions(+), 13 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 6033c71a9db..ec103a2f40b 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) nvptx_stacks_free (stacks, teams * threads); } -void -GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args, - void *async_data) -{ - GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented"); -} +/* TODO: Implement GOMP_OFFLOAD_async_run. */ diff --git a/libgomp/target.c b/libgomp/target.c index 3df007283f4..4fbf963f305 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, gomp_unmap_vars (tgt_vars, true); } +static unsigned int +clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags) +{ + /* If we cannot run asynchronously, simply ignore nowait. */ + if (devicep != NULL && devicep->async_run_func == NULL) +flags &= ~GOMP_TARGET_FLAG_NOWAIT; + + return flags; +} + /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present, and several arguments have been added: FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. @@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, size_t tgt_align = 0, tgt_size = 0; bool fpc_done = false; + flags = clear_unsupported_flags (devicep, flags); + if (flags & GOMP_TARGET_FLAG_NOWAIT) { struct gomp_thread *thr = gomp_thread (); @@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs, { struct gomp_device_descr *devicep = resolve_device (device); + flags = clear_unsupported_flags (devicep, flags); + /* If there are depend clauses, but nowait is not present, block the parent task until the dependencies are resolved and then just continue with the rest of the function as if it @@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void *
Re: [PATCH] openmp: ignore nowait if async execution is unsupported [PR93481]
Hi Jakub, On 13.02.20 09:30, Jakub Jelinek wrote: > On Thu, Feb 13, 2020 at 09:04:36AM +0100, Harwath, Frederik wrote: >> --- a/libgomp/target.c >> +++ b/libgomp/target.c >> @@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const >> void *unused, >>gomp_unmap_vars (tgt_vars, true); >> } >> >> +static unsigned int > > Add inline? > Added. >> @@ -2257,6 +2269,8 @@ GOMP_target_update_ext (int device, size_t mapnum, >> void **hostaddrs, >> { >>struct gomp_device_descr *devicep = resolve_device (device); >> >> + flags = clear_unsupported_flags (devicep, flags); >> @@ -2398,6 +2412,8 @@ GOMP_target_enter_exit_data (int device, size_t >> mapnum, void **hostaddrs, >> { >>struct gomp_device_descr *devicep = resolve_device (device); >> >> + flags = clear_unsupported_flags (devicep, flags); > I don't see why you need to do the above two. GOMP_TARGET_TASK_DATA > is done on the host side, async_run callback isn't called in that case > and while we create a task, all we do is wait for the (host) dependencies > in there and then perform the data transfer we need. > I think it is perfectly fine to ignore nowait on target but honor it > on target update or target {enter,exit} data. I see. Removed. > Otherwise LGTM. Thanks for the review! I have committed the patch with those changes. I forgot to include the ChangeLog entry which I had to add in a separate commit. Sorry for that! It seems that I have to adapt my workflow - perhaps some pre-push hook ;-). Best regards, Frederik From 001ab12e620c6f117b2e93c77d188bd62fe7ba03 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Thu, 13 Feb 2020 07:30:16 +0100 Subject: [PATCH 1/2] openmp: ignore nowait if async execution is unsupported [PR93481] An OpenMP "nowait" clause on a target construct currently leads to a call to GOMP_OFFLOAD_async_run in the plugin that is used for offloading at execution time. The nvptx plugin contains only a stub of this function that always produces a fatal error if called. This commit changes the "nowait" implementation to ignore the clause if the executing device's plugin does not implement GOMP_OFFLOAD_async_run. The stub in the nvptx plugin is removed which effectively means that programs containing "nowait" can now be executed with nvptx offloading as if the clause had not been used. This behavior is consistent with the OpenMP specification which says that "[...] execution of the target task *may* be deferred" (emphasis added), cf. OpenMP 5.0, page 172. libgomp/ * plugin/plugin-nvptx.c: Remove GOMP_OFFLOAD_async_run stub. * target.c (gomp_load_plugin_for_device): Make "async_run" loading optional. (gomp_target_task_fn): Assert "devicep->async_run_func". (clear_unsupported_flags): New function to remove unsupported flags (right now only GOMP_TARGET_FLAG_NOWAIT) that can be be ignored. (GOMP_target_ext): Apply clear_unsupported_flags to flags. * testsuite/libgomp.c/target-33.c: Remove xfail for offload_target_nvptx. * testsuite/libgomp.c/target-34.c: Likewise. --- libgomp/plugin/plugin-nvptx.c | 7 +-- libgomp/target.c| 15 ++- libgomp/testsuite/libgomp.c/target-33.c | 3 --- libgomp/testsuite/libgomp.c/target-34.c | 3 --- 4 files changed, 15 insertions(+), 13 deletions(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 6033c71a9db..ec103a2f40b 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1931,9 +1931,4 @@ GOMP_OFFLOAD_run (int ord, void *tgt_fn, void *tgt_vars, void **args) nvptx_stacks_free (stacks, teams * threads); } -void -GOMP_OFFLOAD_async_run (int ord, void *tgt_fn, void *tgt_vars, void **args, - void *async_data) -{ - GOMP_PLUGIN_fatal ("GOMP_OFFLOAD_async_run unimplemented"); -} +/* TODO: Implement GOMP_OFFLOAD_async_run. */ diff --git a/libgomp/target.c b/libgomp/target.c index 3df007283f4..0ff727de47d 100644 --- a/libgomp/target.c +++ b/libgomp/target.c @@ -2022,6 +2022,16 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, gomp_unmap_vars (tgt_vars, true); } +static inline unsigned int +clear_unsupported_flags (struct gomp_device_descr *devicep, unsigned int flags) +{ + /* If we cannot run asynchronously, simply ignore nowait. */ + if (devicep != NULL && devicep->async_run_func == NULL) +flags &= ~GOMP_TARGET_FLAG_NOWAIT; + + return flags; +} + /* Like GOMP_target, but KINDS is 16-bit, UNUSED is no longer present, and several arguments have been added: FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. @@ -2054,6 +2064,8 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum, size_t tgt_align = 0, tgt_size
Re: [Patch, Fortran] PR 92793 - fix column used for error diagnostic
Hi Tobias, On 04.12.19 14:37, Tobias Burnus wrote: > As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN > == 0 to the middle end. The reason for that is how parsing works – gfortran > reads the input line by line. > > For internal error diagnostic (fortran/error.c), the column location was > corrected – but not for locations passed to the middle end. Hence, the > diagnostic there wasn't optimal. I am not sure if those changes have any impact on existing diagnostics - probably not or you would have needed to change some tests in your patch. Thus, I want to confirm that this fixes the problems that I had when trying to emit warnings that referenced the location of OpenACC reduction clauses from pass_lower_omp when compiling Fortran code. Where previously inform (OMP_CLAUSE_LOCATION (some_omp_clause), "Some message."); would produce [...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:0: note: Some message. I now get the expected result: [...] /gcc/testsuite/gfortran.dg/goacc/nested-reductions-warn.f90:19:27: note: Some message. (Well, not completely as expected. In this case where the clause is an OpenACC reduction clause, the location of the clause is a bit off because it points to the reduction variable and not to the beginning of the clause, but that's another issue which is not related to this patch ;-) ) The existing translation of the reduction clauses has another bug. It uses the location of the first clause from the reduction list for all clauses. This could be fixed by changing the patch as follows: > @@ -1854,7 +1854,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist > *namelist, tree list, > tree t = gfc_trans_omp_variable (namelist->sym, false); > if (t != error_mark_node) > { > - tree node = build_omp_clause (where.lb->location, > + tree node = build_omp_clause (gfc_get_location (&where), > OMP_CLAUSE_REDUCTION); > OMP_CLAUSE_DECL (node) = t; > if (mark_addressable) Here "&where" should be "&namelist->where" to use the location of the current clause. I have verified that this yields the correct locations for all clauses using the nested-reductions-warn.f90 test. Thank you for fixing this! Best regards, Frederik
[PATCH][AMDGCN] Skip test gcc/testsuite/gcc.dg/asm-4.c
Hi, the inline assembly "p" modifier ("An operand that is a valid memory address is allowed", cf. https://gcc.gnu.org/onlinedocs/gcc/Simple-Constraints.html#Simple-Constraints) is not supported on AMD GCN. This causes an ICE during the compilation of gcc.dg/asm-4.c. We should skip the test for the amdgcn-*-* target. Can I merge the patch below into trunk? Best regards, Frederik 2019-12-05 Frederik Harwath gcc/testsuite/ * gcc.dg/asm-4.c: Skip on target amdgcn-*-*. Index: gcc/testsuite/gcc.dg/asm-4.c === --- gcc/testsuite/gcc.dg/asm-4.c(revision 278932) +++ gcc/testsuite/gcc.dg/asm-4.c(working copy) @@ -3,6 +3,7 @@ /* "p" modifier can't be used to generate a valid memory address with ILP32. */ /* { dg-skip-if "" { aarch64*-*-* && ilp32 } } */ +/* { dg-skip-if "'p' is not supported for GCN" { amdgcn-*-* } } */ int main() {
[PATCH] Fix column information for omp_clauses in Fortran code
Hi, Tobias has recently fixed a problem with the column information in gfortran locations ("PR 92793 - fix column used for error diagnostic"). Diagnostic messages for OpenMP/OpenACC clauses do not contain the right column information yet. The reason is that the location information of the first clause is used for all clauses on a line and hence the columns are wrong for all but the first clause. The attached patch fixes this problem. I have tested the patch manually by adapting the validity check for nested OpenACC reductions (see omp-low.c) to include the location of clauses in warnings instead of the location of the loop to which the clause belongs. I can add a regression test based on this later on after adapting the code in omp-low.c. Is it ok to include the patch in trunk? Best regards, Frederik On 04.12.19 14:37, Tobias Burnus wrote: > As reported internally by Frederik, gfortran currently passes LOCATION_COLUMN > == 0 to the middle end. The reason for that is how parsing works – gfortran > reads the input line by line. > > For internal error diagnostic (fortran/error.c), the column location was > corrected – but not for locations passed to the middle end. Hence, the > diagnostic there wasn't optimal. > > Fixed by introducing a new function; now one only needs to make sure that no > new code will re-introduce "lb->location" :-) > > Build and regtested on x86-64-gnu-linux. > OK for the trunk? > > Tobias From af3a63b64f38d522b0091a123a919d1f20f5a8b1 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 9 Dec 2019 15:07:53 +0100 Subject: [PATCH] Fix column information for omp_clauses in Fortran code The location of all OpenMP/OpenACC clauses on any given line in Fortran code always points to the first clause on that line. Hence, the column information is wrong for all clauses but the first one. Use the correct location for each clause instead. 2019-12-09 Frederik Harwath /gcc/fortran/ * trans-openmp (gfc_trans_omp_reduction_list): Pass correct location for each clause to build_omp_clause. --- gcc/fortran/trans-openmp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index d07ff86fc0b..356fd04e6c3 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1982,7 +1982,7 @@ gfc_trans_omp_reduction_list (gfc_omp_namelist *namelist, tree list, tree t = gfc_trans_omp_variable (namelist->sym, false); if (t != error_mark_node) { - tree node = build_omp_clause (gfc_get_location (&where), + tree node = build_omp_clause (gfc_get_location (&namelist->where), OMP_CLAUSE_REDUCTION); OMP_CLAUSE_DECL (node) = t; if (mark_addressable) -- 2.17.1
[PATCH 1/2] Use clause locations in OpenACC nested reduction warnings
Since the Fortran front-end now sets the clause locations correctly, we can emit warnings with more precise locations if we encounter conflicting operations for a variable in reduction clauses. 2019-12-10 Frederik Harwath gcc/ * omp-low.c (scan_omp_for): Use clause location in warning. --- gcc/omp-low.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index ad26f7918a5..d422c205836 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -2473,7 +2473,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx) tree_code outer_op = OMP_CLAUSE_REDUCTION_CODE (outer_clause); if (outer_var == local_var && outer_op != local_op) { - warning_at (gimple_location (stmt), 0, + warning_at (OMP_CLAUSE_LOCATION (local_clause), 0, "conflicting reduction operations for %qE", local_var); inform (OMP_CLAUSE_LOCATION (outer_clause), -- 2.17.1
[PATCH 0/2] Add tests to verify OpenACC clause locations
Hi, On 09.12.19 16:58, Harwath, Frederik wrote: > Tobias has recently fixed a problem with the column information in gfortran > locations > [...] > I have tested the patch manually by adapting the validity check for nested > OpenACC reductions (see omp-low.c) > to include the location of clauses in warnings instead of the location of the > loop to which the clause belongs. > I can add a regression test based on this later on after adapting the code in > omp-low.c. here are patches adding the promised test for Fortran and a corresponding test for C. Is it ok to include them in trunk? Best regards, Frederik Frederik Harwath (2): Use clause locations in OpenACC nested reduction warnings Add tests to verify OpenACC clause locations gcc/omp-low.c | 2 +- gcc/testsuite/gcc.dg/goacc/clause-locations.c | 17 + .../gfortran.dg/goacc/clause-locations.f90 | 18 ++ 3 files changed, 36 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 -- 2.17.1
[PATCH 2/2] Add tests to verify OpenACC clause locations
Check that the column information for OpenACC clauses is communicated correctly to the middle-end, in particular by the Fortran front-end (cf. PR 92793). 2019-12-10 Frederik Harwath gcc/testsuite/ * gcc.dg/goacc/clause-locations.c: New test. * gfortran.dg/goacc/clause-locations.f90: New test. --- gcc/testsuite/gcc.dg/goacc/clause-locations.c | 17 + .../gfortran.dg/goacc/clause-locations.f90 | 18 ++ 2 files changed, 35 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/goacc/clause-locations.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 diff --git a/gcc/testsuite/gcc.dg/goacc/clause-locations.c b/gcc/testsuite/gcc.dg/goacc/clause-locations.c new file mode 100644 index 000..51184e3517b --- /dev/null +++ b/gcc/testsuite/gcc.dg/goacc/clause-locations.c @@ -0,0 +1,17 @@ +/* Verify that the location information for clauses is correct. */ + +void +check_clause_columns() { + int i, j, sum, diff; + + #pragma acc parallel + { +#pragma acc loop reduction(+:sum) +for (i = 1; i <= 10; i++) + { +#pragma acc loop reduction(-:diff) reduction(-:sum) /* { dg-warning "53: conflicting reduction operations for .sum." } */ + for (j = 1; j <= 10; j++) + sum = 1; + } + } +} diff --git a/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 new file mode 100644 index 000..29798d31542 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/clause-locations.f90 @@ -0,0 +1,18 @@ +! Verify that the location information for clauses is correct. +! See also PR 92793. + +subroutine check_clause_columns () + implicit none (type, external) + integer :: i, j, sum, diff + + !$acc parallel +!$acc loop reduction(+:sum) +do i = 1, 10 + !$acc loop reduction(-:diff) reduction(-:sum) ! { dg-warning "47: conflicting reduction operations for .sum." } + do j = 1, 10 +sum = 1 + end do +end do + !$acc end parallel +end subroutine check_clause_columns + -- 2.17.1
Re: [PATCH 0/2] Add tests to verify OpenACC clause locations
Hi Thomas, On 10.12.19 15:44, Thomas Schwinge wrote: >> Frederik Harwath (2): >> Use clause locations in OpenACC nested reduction warnings >> Add tests to verify OpenACC clause locations > > I won't insist, but suggest (common practice) to merge that into one > patch: bug fix plus test cases, using the summary line of your first > patch.> [...] > It's of course always OK to add new test cases, but wouldn't the same > test coverage be reached by just adding such checking to the existing > test cases in 'c-c++-common/goacc/nested-reductions-warn.c', > 'gfortran.dg/goacc/nested-reductions-warn.f90'? Sure, we could have everything in one patch and one test. The rationale for splitting the patches and for splitting the tests is that the tests do not try to verify the nested reductions validation code. They try to verify that the language front-ends set the correct locations for clauses. Without a possibility to do proper unit testing, I just had to find some way to check the clauses. I had no immediate success triggering one of the very few other warnings that use the location of omp_clauses from both Fortran and C code and hence I went with the nested reductions code. Thanks for your review! Best regards, Frederik
Re: [PATCH 0/2] Add tests to verify OpenACC clause locations
Hi Thomas, On 10.12.19 15:44, Thomas Schwinge wrote: > Thanks, yes, with my following remarks considered, and acted on per your > preference. To record the review effort, please include "Reviewed-by: > Thomas Schwinge " in the commit log, see > <https://gcc.gnu.org/wiki/Reviewed-by>. Committed as r279168 and r279169. Frederik
[PATCH, committed] Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c
Hi, I have committed the attached trivial patch to trunk as r279215. The columns of the clause locations are reported differently by the C and C++ front-end and hence we need different test expectations for both languages. Best regards, Frederik r279215 | frederik | 2019-12-11 09:26:18 +0100 (Mi, 11 Dez 2019) | 12 lines Fix PR92901: Change test expectation for C++ in OpenACC test clause-locations.c The columns of the clause locations that are reported for C and C++ are different and hence we need separate test expectations for both languages. 2019-12-11 Frederik Harwath PR other/92901 /gcc/testsuite/ * c-c++-common/clause-locations.c: Adjust test expectation for C++. Index: gcc/testsuite/c-c++-common/goacc/clause-locations.c === --- gcc/testsuite/c-c++-common/goacc/clause-locations.c (revision 279214) +++ gcc/testsuite/c-c++-common/goacc/clause-locations.c (working copy) @@ -9,7 +9,9 @@ #pragma acc loop reduction(+:sum) for (i = 1; i <= 10; i++) { -#pragma acc loop reduction(-:diff) reduction(-:sum) /* { dg-warning "53: conflicting reduction operations for .sum." } */ +#pragma acc loop reduction(-:diff) reduction(-:sum) + /* { dg-warning "53: conflicting reduction operations for .sum." "" { target c } .-1 } */ + /* { dg-warning "56: conflicting reduction operations for .sum." "" { target c++ } .-2 } */ for (j = 1; j <= 10; j++) sum = 1; }
Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi Thomas, thanks for the review! I have attached a revised patch. > > There is no AMD GCN support yet. This will be added later on. > > ACK, just to note that there now is a 'libgomp/plugin/plugin-gcn.c' that > at least needs to get a stub implementation (can mostly copy from > 'libgomp/plugin/plugin-hsa.c'?) as otherwise the build will fail. Yes, I have added a stub. A full implementation will follow soon. The implementation in the OG9 branch that Andrew mentioned will need a bit of polishing. > Tobias has generally reviewed the Fortran bits, correct? Yes, he has done that internally. > | Before Frederik starts working on integrating this into GCC trunk, do you > | (Jakub) agree with the libgomp plugin interface changes as implemented by > | Maciej? For example, top-level 'GOMP_OFFLOAD_get_property' function in > | 'struct gomp_device_descr' instead of stuffing this into its > | 'acc_dispatch_t openacc'. (I never understood why the OpenACC functions > | need to be segregated like they are.) > > Jakub didn't answer, but I now myself decided that we should group this > with the other OpenACC libgomp-plugin functions, as this interface is > defined in terms of OpenACC-specific stuff such as 'acc_device_t'. > Frederik, please work on that, also try to move function definitions etc. > into appropriate places in case they aren't; ask if you need help. > That needs to be updated. Is it ok to do this in a separate follow-up patch? > > .../acc-get-property-2.c | 68 + > > .../acc-get-property-3.c | 19 +++ > > .../acc-get-property-aux.c| 60 > > .../acc-get-property.c| 75 ++ > > .../libgomp.oacc-fortran/acc-get-property.f90 | 80 ++ > > Please name all these 'acc_get_property*', which is the name of the > interface tested. Ok. > > --- a/include/gomp-constants.h > > +++ b/include/gomp-constants.h > > @@ -178,6 +178,20 @@ enum gomp_map_kind > >=20=20 > > #define GOMP_DEVICE_ICV-1 > > #define GOMP_DEVICE_HOST_FALLBACK -2 > > +#define GOMP_DEVICE_CURRENT-3 > [...] > > Not should if this should be grouped with 'GOMP_DEVICE_ICV', > 'GOMP_DEVICE_HOST_FALLBACK', for it is not related to there. > > [...] > > Should this actually get value '-1' instead of '-3'? Or, is the OpenACC > 'acc_device_t' code already paying special attention to negative values > '-1', '-2'? (I don't think so.) > | Also, 'acc_device_current' is a libgomp-internal thing (doesn't interface > | with the compiler proper), so strictly speaking 'GOMP_DEVICE_CURRENT' > | isn't needed in 'include/gomp-constants.h'. But probably still a good > | idea to list it there, in this canonical place, to keep the several lists > | of device types coherent. > still wonder about that... ;-) I have removed GOMP_DEVICE_CURRENT from gomp-constants.h. Changing the value of GOMP_DEVICE_ICV violates the following static asserts in oacc-parallel.c: /* In the ABI, the GOACC_FLAGs are encoded as an inverted bitmask, so that we continue to support the following two legacy values. */ _Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_ICV) == 0, "legacy GOMP_DEVICE_ICV broken"); _Static_assert (GOACC_FLAGS_UNMARSHAL (GOMP_DEVICE_HOST_FALLBACK) == GOACC_FLAG_HOST_FALLBACK, "legacy GOMP_DEVICE_HOST_FALLBACK broken"); > > +/* Device property codes. Keep in sync with > > + libgomp/{openacc.h,openacc.f90,openacc_lib.h}:acc_device_property_t > > | Same thing, libgomp-internal, not sure whether to list these here? > > > + as well as libgomp/libgomp-plugin.h. */ > > (Not sure why 'libgomp/libgomp-plugin.h' is relevant here?) It does not seem to be relevant. Right now, openacc_lib.h is also not relevant. I have removed both file names from the comment. > > +#define GOMP_DEVICE_PROPERTY_MEMORY1 > > +#define GOMP_DEVICE_PROPERTY_FREE_MEMORY 2 > > +#define GOMP_DEVICE_PROPERTY_NAME 0x10001 > > +#define GOMP_DEVICE_PROPERTY_VENDOR0x10002 > > +#define GOMP_DEVICE_PROPERTY_DRIVER0x10003 > > + > > +/* Internal property mask to tell numeric and string values apart. */ > > +#define GOMP_DEVICE_PROPERTY_STRING_MASK 0x1 > > (Maybe should use an 'enum'?) I have changed this to an enum. However, this does not improve the code much, since we cannot use the enum for the function argume
Re: [PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi Thomas, >> Is it ok to commit the patch to trunk? > > OK, thanks. And then some follow-up/clean-up next year, also including > some of the open questions that I've snipped off here. Right, thanks for the review! I have committed the patch as r279710 with a minor change: I have disabled the new acc_get_property.{c,f90} tests for the amdgcn offload target for now. Best regards, Frederik
*ping* - Re: [Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)
PING Hi Jakub, I have attached a version of the patch that has been rebased on the current trunk. Frederik On 03.12.19 12:16, Harwath, Frederik wrote: > On 08.11.19 07:41, Harwath, Frederik wrote: >> On 06.11.19 14:00, Jakub Jelinek wrote: >> [...] >>> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be >>> more natural, wouldn't it. >> [...] >>> If gimplifier is not the right spot, then use a splay tree + vector instead? >>> splay tree for the outer ones, vector for the local ones, and put into both >>> the clauses, so you can compare reduction code etc. >> >> Sounds like a good idea. I am going to try that. > > Below you can find a patch that reimplements the nested reductions check using > more appropriate data structures. [...] From b08855328c52e36143770e442e50ba87f25c14b3 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Wed, 8 Jan 2020 14:00:44 +0100 Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking Revision 277875 of trunk introduced a consistency check for nested OpenACC reduction clauses. The implementation has two drawbacks: 1) It uses suboptimal data structures for storing information about the reduction clauses. 2) The warnings issued for *repeated* inconsistent use of reduction operators are confusing. For instance, on three nested loops that use the reduction operators +, -, + on the same variable, we obtain a warning at the switch from + to - (as desired) and another warning about the switch from - to +. It would be preferable to avoid the second warning since + is consistent with the first reduction operator. This commit attempts to fix both problems by using more appropriate data structures (splay trees and vectors instead of tree lists) for keeping track of the information about the reduction clauses. 2020-01-08 Frederik Harwath gcc/ * omp-low.c (omp_context): Removed fields local_reduction_clauses, outer_reduction_clauses; added fields oacc_reduction_clauses, oacc_reductions_stack. (oacc_reduction_clause_location): New struct. (oacc_reduction_var_occ): New struct. (new_omp_context): Adjust omp_context initialization to new fields. (delete_omp_context): Adjust omp_context deletion to new fields. (rewind_oacc_reductions_stack): New function. (check_oacc_reduction_clause): New function. (check_oacc_reduction_clauses): New function. (scan_sharing_clauses): Call check_oacc_reduction_clause for reduction clauses (this handles clauses on compute regions) if a new optional flag is enabled. (scan_omp_for): Remove old nested reduction check, call check_oacc_reduction_clauses instead. (scan_omp_target): Adapt call to scan_sharing_clauses to enable the new flag. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to ignore warnings that are not relevant to the test. (acc_parallel): Stop expecting pruned warnings, adjust expected warnings to changes in omp-low.c, add checks for info messages about the location of clauses. (acc_parallel_loop): Likewise. (acc_parallel_reduction): Likewise. (acc_parallel_loop_reduction): Likewise. (acc_routine): Likewise. (acc_kernels): Likewise. * gfortran.dg/goacc/nested-reductions-warn.f90: Likewise. --- gcc/omp-low.c | 306 -- .../goacc/nested-reductions-warn.c| 81 ++--- .../goacc/nested-reductions-warn.f90 | 83 ++--- 3 files changed, 271 insertions(+), 199 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index e692a53a3de..6026b7aff89 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -73,6 +73,9 @@ along with GCC; see the file COPYING3. If not see scanned for regions which are then moved to a new function, to be invoked by the thread library, or offloaded. */ + +struct oacc_reduction_var_occ; + /* Context structure. Used to store information about each parallel directive in the code. */ @@ -128,12 +131,6 @@ struct omp_context corresponding tracking loop iteration variables. */ hash_map *lastprivate_conditional_map; - /* A tree_list of the reduction clauses in this context. */ - tree local_reduction_clauses; - - /* A tree_list of the reduction clauses in outer contexts. */ - tree outer_reduction_clauses; - /* Nesting depth of this context. Used to beautify error messages re invalid gotos. The outermost ctx is depth 1, with depth 0 being reserved for the main body of the function. */ @@ -163,8 +160,52 @@ struct omp_context /* True if there is bind clause on the construct (i.e. a loop construct). */ bool loop_p; + + /* A mapping that maps a variable to information about the last OpenACC + reduction clause that used the variable above the current context. + This information is used for checking the nesting restrictions for + reduction clauses by the function
Re: [PATCH] OpenMP: warn about iteration var modifications in loop body
Ping. The Linaro CI has kindly pointed me to two test regressions that I had missed. I have adjust the test expectations in the updated patch which I have attached. Frederik On 28.02.24 8:32 PM, Frederik Harwath wrote: Hi, this patch implements a warning about (some simple cases of direct) modifications of iteration variables in OpenMP loops which are forbidden according to the OpenMP specification. I think this can be helpful, especially for new OpenMP users. I have implemented this after I observed some confusion concerning this topic recently. The check is implemented during gimplification. It reuses the "loop_iter_var" vector in the "gimplify_omp_ctx" which was previously only used for "doacross" handling to identify the loop iteration variables during the gimplification of MODIFY_EXPRs in omp_for bodies. I have only added a common C/C++ test because I don't see any special C++ constructs for which a warning *should* be emitted and Fortran rejects modifications of iteration variables in do loops in general. I have run "make check" on x86_64-linux-gnu and not observed any regressions. Is it ok to commit this? Best regards, Frederik From d4fb1710bfa1d5b66979db1f0aea2d5c68ab2264 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 27 Feb 2024 21:07:00 + Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body OpenMP loop iteration variables may not be changed by user code in the loop body according to the OpenMP specification. In general, the compiler cannot enforce this, but nevertheless simple cases in which the user modifies the iteration variable directly in the loop body (in contrast to, e.g., modifications through a pointer) can be recognized. A warning should be useful, for instance, to new users of OpenMP. This commit implements a warning about forbidden iteration var modifications during gimplification. It reuses the "loop_iter_var" vector in the "gimplify_omp_ctx" which was previously only used for "doacross" handling to identify the loop iteration variables during the gimplification of MODIFY_EXPRs in omp_for bodies. gcc/ChangeLog: * gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to recognize the gimplification state during which the new warning should be emitted. Add field "is_doacross" to distinguish the original use of "loop_iter_var" from its new use. (new_omp_context): Initialize new gimplify_omp_ctx fields. (gimplify_modify_expr): Emit warning if iter var is modified. (gimplify_omp_for): Make initialization and filling of loop_iter_var vector unconditional and adjust new gimplify_omp_ctx fields before gimplifying the omp_for body. (gimplify_omp_ordered): Check for do_across field in addition to emptiness check on loop_iter_var vector since the vector is now always being filled. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr92347.c: Adjust. * gcc.target/aarch64/sve/pr96195.c: Adjust. * c-c++-common/gomp/iter-var-modification.c: New test. Signed-off-by: Frederik Harwath --- gcc/gimplify.cc | 54 +++--- .../c-c++-common/gomp/iter-var-modification.c | 100 ++ gcc/testsuite/gcc.dg/vect/pr92347.c | 2 +- .../gcc.target/aarch64/sve/pr96195.c | 2 +- 4 files changed, 140 insertions(+), 18 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 7f79b3cc7e6..a74ad987cf7 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -235,6 +235,8 @@ struct gimplify_omp_ctx bool order_concurrent; bool has_depend; bool in_for_exprs; + bool in_omp_for_body; + bool is_doacross; int defaultmap[5]; }; @@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type) c->privatized_types = new hash_set; c->location = input_location; c->region_type = region_type; + c->loop_iter_var.create (0); + c->in_omp_for_body = false; + c->is_doacross = false; + if ((region_type & ORT_TASK) == 0) c->default_kind = OMP_CLAUSE_DEFAULT_SHARED; else @@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR || TREE_CODE (*expr_p) == INIT_EXPR); + if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body) +{ + size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2; + for (size_t i = 0; i < num_vars; i++) + { + if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1]) + warning_at (input_location, OPT_Wopenmp, + "forbidden modification of iteration variable %qE in " + "OpenMP loop", *to_p); + } +} + /* Trying to simplify a clobber using normal logic doesn't work, so handle it here. */ if (TREE_CLOBBER_P (*from_p)) @@ -15334
[PATCH] OpenMP: warn about iteration var modifications in loop body
Hi, this patch implements a warning about (some simple cases of direct) modifications of iteration variables in OpenMP loops which are forbidden according to the OpenMP specification. I think this can be helpful, especially for new OpenMP users. I have implemented this after I observed some confusion concerning this topic recently. The check is implemented during gimplification. It reuses the "loop_iter_var" vector in the "gimplify_omp_ctx" which was previously only used for "doacross" handling to identify the loop iteration variables during the gimplification of MODIFY_EXPRs in omp_for bodies. I have only added a common C/C++ test because I don't see any special C++ constructs for which a warning *should* be emitted and Fortran rejects modifications of iteration variables in do loops in general. I have run "make check" on x86_64-linux-gnu and not observed any regressions. Is it ok to commit this? Best regards, Frederik From 4944a9f94bcda9907e0118e71137ee7e192657c2 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 27 Feb 2024 21:07:00 + Subject: [PATCH] OpenMP: warn about iteration var modifications in loop body OpenMP loop iteration variables may not be changed by user code in the loop body according to the OpenMP specification. In general, the compiler cannot enforce this, but nevertheless simple cases in which the user modifies the iteration variable directly in the loop body (in contrast to, e.g., modifications through a pointer) can be recognized. A warning should be useful, for instance, to new users of OpenMP. This commit implements a warning about forbidden iteration var modifications during gimplification. It reuses the "loop_iter_var" vector in the "gimplify_omp_ctx" which was previously only used for "doacross" handling to identify the loop iteration variables during the gimplification of MODIFY_EXPRs in omp_for bodies. gcc/ChangeLog: * gimplify.cc (struct gimplify_omp_ctx): Add field "in_omp_for_body" to recognize the gimplification state during which the new warning should be emitted. Add field "is_doacross" to distinguish the original use of "loop_iter_var" from its new use. (new_omp_context): Initialize new gimplify_omp_ctx fields. (gimplify_modify_expr): Emit warning if iter var is modified. (gimplify_omp_for): Make initialization and filling of loop_iter_var vector unconditional and adjust new gimplify_omp_ctx fields before gimplifying the omp_for body. (gimplify_omp_ordered): Check for do_across field in addition to emptiness check on loop_iter_var vector since the vector is now always being filled. gcc/testsuite/ChangeLog: * c-c++-common/gomp/iter-var-modification.c: New test. Signed-off-by: Frederik Harwath --- gcc/gimplify.cc | 54 +++--- .../c-c++-common/gomp/iter-var-modification.c | 100 ++ 2 files changed, 138 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/iter-var-modification.c diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 7f79b3cc7e6..a74ad987cf7 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -235,6 +235,8 @@ struct gimplify_omp_ctx bool order_concurrent; bool has_depend; bool in_for_exprs; + bool in_omp_for_body; + bool is_doacross; int defaultmap[5]; }; @@ -456,6 +458,10 @@ new_omp_context (enum omp_region_type region_type) c->privatized_types = new hash_set; c->location = input_location; c->region_type = region_type; + c->loop_iter_var.create (0); + c->in_omp_for_body = false; + c->is_doacross = false; + if ((region_type & ORT_TASK) == 0) c->default_kind = OMP_CLAUSE_DEFAULT_SHARED; else @@ -6312,6 +6318,18 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, gcc_assert (TREE_CODE (*expr_p) == MODIFY_EXPR || TREE_CODE (*expr_p) == INIT_EXPR); + if (gimplify_omp_ctxp && gimplify_omp_ctxp->in_omp_for_body) +{ + size_t num_vars = gimplify_omp_ctxp->loop_iter_var.length () / 2; + for (size_t i = 0; i < num_vars; i++) + { + if (*to_p == gimplify_omp_ctxp->loop_iter_var[2 * i + 1]) + warning_at (input_location, OPT_Wopenmp, + "forbidden modification of iteration variable %qE in " + "OpenMP loop", *to_p); + } +} + /* Trying to simplify a clobber using normal logic doesn't work, so handle it here. */ if (TREE_CLOBBER_P (*from_p)) @@ -15334,6 +15352,8 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) == TREE_VEC_LENGTH (OMP_FOR_COND (for_stmt))); gcc_assert (TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)) == TREE_VEC_LENGTH (OMP_FOR_INCR (for_stmt))); + int len = TREE_VEC_LENGTH (OMP_FOR_INIT (for_stmt)); + gimplify_omp_ctxp->loop_iter_var.create (len * 2); tree c = omp_find_clause (OMP_FOR_CLAUSES (for_stm
[PATCH] amdgcn: Add gfx90c target
Hi Andrew, this patch adds support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation (https://llvm.org/docs/AMDGPUUsage.html) lists those devices as unsupported by rocm-amdhsa. As we have discussed elsewhere, I have tested the patch on an AMD Ryzen 5 5500U (also with different xnack settings) that I have and it passes most libgomp offloading tests. Although those APUs are very constrainted compared to dGPUs, I think they might be interesting for learning, experimentation, and testing. Can I commit the patch to the master branch? Best regards, Frederik From 809e2a0248e6fad1e8336b4a883a729017cc62e5 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Wed, 24 Apr 2024 20:29:14 +0200 Subject: [PATCH] amdgcn: Add gfx90c target Add support for gfx90c GCN5 APU integrated graphics devices. The LLVM AMDGPU documentation does not list those devices as supported by rocm-amdhsa, but it passes most libgomp offloading tests. Although they are constrainted compared to dGPUs, they might be interesting for learning, experimentation, and testing. gcc/ChangeLog: * config.gcc: Add gfx90c. * config/gcn/gcn-hsa.h (NO_SRAM_ECC): Likewise. * config/gcn/gcn-opts.h (enum processor_type): Likewise. (TARGET_GFX90c): New macro. * config/gcn/gcn.cc (gcn_option_override): Handle gfx90c. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h: Add gfx90c. * config/gcn/gcn.opt: Likewise. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90c): New macro. (get_arch): Handle gfx90c. (main): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c * config/gcn/t-omp-device: Add gfx90c. * doc/install.texi: Likewise. * doc/invoke.texi: Likewise. libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_hsa_name): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. (isa_code): Handle gfx90c. (max_isa_vgprs): Handle EF_AMDGPU_MACH_AMDGCN_GFX90c. Signed-off-by: Frederik Harwath --- gcc/config.gcc | 4 ++-- gcc/config/gcn/gcn-hsa.h| 2 +- gcc/config/gcn/gcn-opts.h | 2 ++ gcc/config/gcn/gcn.cc | 8 gcc/config/gcn/gcn.h| 2 ++ gcc/config/gcn/gcn.opt | 3 +++ gcc/config/gcn/mkoffload.cc | 9 + gcc/config/gcn/t-omp-device | 2 +- gcc/doc/install.texi| 4 ++-- gcc/doc/invoke.texi | 3 +++ libgomp/plugin/plugin-gcn.c | 9 + 11 files changed, 42 insertions(+), 6 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 5df3c52f8e9..1bf07b6eece 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -4569,7 +4569,7 @@ case "${target}" in for which in arch tune; do eval "val=\$with_$which" case ${val} in - "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx1030 | gfx1036 | gfx1100 | gfx1103) + "" | fiji | gfx900 | gfx906 | gfx908 | gfx90a | gfx90c | gfx1030 | gfx1036 | gfx1100 | gfx1103) # OK ;; *) @@ -4585,7 +4585,7 @@ case "${target}" in TM_MULTILIB_CONFIG= ;; xdefault | xyes) - TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"` + TM_MULTILIB_CONFIG=`echo "gfx900,gfx906,gfx908,gfx90a,gfx90c,gfx1030,gfx1036,gfx1100,gfx1103" | sed "s/${with_arch},\?//;s/,$//"` ;; *) TM_MULTILIB_CONFIG="${with_multilib_list}" diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h index 7d6e3141cea..4611bc55392 100644 --- a/gcc/config/gcn/gcn-hsa.h +++ b/gcc/config/gcn/gcn-hsa.h @@ -93,7 +93,7 @@ extern unsigned int gcn_local_sym_hash (const char *name); #define NO_XNACK "march=fiji:;march=gfx1030:;march=gfx1036:;march=gfx1100:;march=gfx1103:;" \ /* These match the defaults set in gcn.cc. */ \ "!mxnack*|mxnack=default:%{march=gfx900|march=gfx906|march=gfx908:-mattr=-xnack};" -#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;" +#define NO_SRAM_ECC "!march=*:;march=fiji:;march=gfx900:;march=gfx906:;march=gfx90c:;" /* In HSACOv4 no attribute setting means the binary supports "any" hardware configuration. The name of the attribute also changed. */ diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h index 49099bad7e7..1091035a69a 100644 --- a/gcc/config/gcn/gcn-opts.h +++ b/gcc/config/gcn/gcn-opts.h @@ -25,6 +25,7 @@ enum processor_type PROCESSOR_VEGA20, // gfx906 PROCESSOR_GFX908, PROCESSOR_GFX90a, + PROCESSOR_GFX90c, PROCESSOR_GFX1030, PROCESSOR_GFX1036, PROCESSOR_GFX1100, @@ -36,6 +37,7 @@ enum processor_type #define TARGET_VEGA20 (gcn_arch == PROCESSOR_VEGA20) #define TARGET_GFX908 (gcn_arch == PROCESSOR_GFX908) #define TARGET_GFX90a (gcn_arch == PROCESSOR_GFX90a) +#define TARGET_GFX90c (gcn_arch == PROCESSOR_GFX90c) #define TARGET_GFX1030 (gcn_arch == PROCESSOR_GFX1030) #define TARGET_GFX1036 (gcn_arch == PROCESSOR_GFX1036) #define TARGET_GFX1100 (gcn_arch == PROCESSOR_GFX1
[PATCH, committed][OpenACC] Adapt libgomp acc_get_property.f90 test
Hi, The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed the name of the type that is used for the return value of the Fortran acc_get_property function without adapting the test acc_get_property.f90. This obvious patch fixes that problem. Committed as r10-6782-g83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a. Best regards, Frederik From 83d45e1d7155a5a600d8a4aa01aca00d3c6c2d3a Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Fri, 21 Feb 2020 15:26:02 +0100 Subject: [PATCH] Adapt libgomp acc_get_property.f90 test The commit r10-6721-g8d1a1cb1b816381bf60cb1211c93b8eba1fe1472 has changed the name of the type that is used for the return value of the Fortran acc_get_property function without adapting the test acc_get_property.f90. 2020-02-21 Frederik Harwath * testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to changes from 2020-02-19, i.e. use integer(c_size_t) instead of integer(acc_device_property) for the type of the return value of acc_get_property. --- libgomp/ChangeLog | 7 +++ .../testsuite/libgomp.oacc-fortran/acc_get_property.f90| 3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog index 3c640c7350b..bff3ae58c9a 100644 --- a/libgomp/ChangeLog +++ b/libgomp/ChangeLog @@ -1,3 +1,10 @@ +2020-02-21 Frederik Harwath + + * testsuite/libgomp.oacc-fortran/acc_get_property.f90: Adapt to + changes from 2020-02-19, i.e. use integer(c_size_t) instead of + integer(acc_device_property) for the type of the return value of + acc_get_property. + 2020-02-19 Tobias Burnus * .gitattributes: New; whitespace handling for Fortran's openacc_lib.h. diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 index 80ae292f41f..1af7cc3b988 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_get_property.f90 @@ -26,13 +26,14 @@ end program test ! and do basic device independent validation. subroutine print_device_properties (device_type) use openacc + use iso_c_binding, only: c_size_t implicit none integer, intent(in) :: device_type integer :: device_count integer :: device - integer(acc_device_property) :: v + integer(c_size_t) :: v character*256 :: s device_count = acc_get_num_devices(device_type) -- 2.17.1
Re: [C/C++, OpenACC] Reject vars of different scope in acc declare (PR94120)
Tobias Burnus writes: Hi Tobias, > Fortran patch: https://gcc.gnu.org/pipermail/gcc-patches/current/541774.html > > "A declare directive must be in the same scope > as the declaration of any var that appears in > the data clauses of the directive." > > ("A declare directive is used […] following a variable >declaration in C or C++".) > > NOTE for C++: This patch assumes that variables in a namespace > are handled in the same way as those which are at > global (namespace) scope; however, the OpenACC specification's > wording currently is "In C or C++ global scope, only …". > Hence, one can argue about this part of the patch; but as > it fixes an ICE and is a very sensible extension – the other > option is to reject it – I believe it is fine. > (On the OpenACC side, this is now Issue 288.) Sounds reasonable to me. > +bool > +c_check_oacc_same_scope (tree decl) > +{ > + struct c_binding *b = I_SYMBOL_BINDING (DECL_NAME (decl)); > + return b != NULL && B_IN_CURRENT_SCOPE (b); > +} Is the function really specific to OpenACC? If not, then "_oacc" could be dropped from its name. How about "c_check_current_scope"? > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c > index 24f71671469..8f09eb0d375 100644 > --- a/gcc/cp/parser.c > +++ b/gcc/cp/parser.c > [...] > - if (global_bindings_p ()) > + if (current_binding_level->kind == sk_namespace) > [...] > - if (error || global_bindings_p ()) > + if (error || current_binding_level->kind == sk_namespace) > return NULL_TREE; So - just to be sure - the new namespace condition subsumes the old "global_bindings_p" condition because the global scope is also a namespace, right? Yes, now I see that you have a test case that demonstrates that the declare directive still works for global variables with those changes. > diff --git a/gcc/testsuite/g++.dg/declare-pr94120.C > b/gcc/testsuite/g++.dg/declare-pr94120.C > new file mode 100644 > index 000..8515c4ff875 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/declare-pr94120.C > @@ -0,0 +1,30 @@ > +/* { dg-do compile } */ > + > +/* PR middle-end/94120 */ > + > +int b[8]; > +#pragma acc declare create (b) Looks good to me. Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
[PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message
Hi, This patch fixes the check for reductions on orphaned gang loops in the Fortran frontend which (in contrast to the C, C++ frontends) erroneously rejects reductions on gang loops that are contained in "kernels" constructs and which hence are not orphaned. According to the OpenACC standard version 2.5 and later, reductions on orphaned gang loops are explicitly disallowed (cf. section "Changes from Version 2.0 to 2.5"). Remember that a loop is "orphaned" if it is not lexically contained in a compute construct (cf. section "Loop construct" of the OpenACC standard), i.e. in either a "parallel", a "serial", or a "kernels" construct. The patch has been tested by running the GCC and libgomp testsuites. The latter tests ran with offloading to nvptx although that should not be important here unless there was some very subtle reason for forbidding the gang reductions on kernels loops. As expect, there seems to be no such reason, i.e. I observed no regressions with the patch. Can I include the patch in OG10? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 7320635211fff3a773beb0de1914dbfcc317ab37 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 7 Jul 2020 10:41:21 +0200 Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message According to the OpenACC standard version 2.5 and later, reductions on orphaned gang loops are explicitly disallowed (cf. section "Changes from Version 2.0 to 2.5"). A loop is "orphaned" if it is not lexically contained in a compute construct (cf. section "Loop construct" of the OpenACC standard), i.e. in either a "parallel", a "serial", or a "kernels" construct. This commit fixes the check for reductions on orphaned gang loops in the Fortran frontend which (in contrast to the C, C++ frontends) erroneously rejects reductions on gang loops that are contained in "kernels" constructs. 2020-07-07 Frederik Harwath gcc/fortran/ * openmp.c (oacc_is_parallel_or_serial): Removed function. (oacc_is_kernels): New function. (oacc_is_compute_construct): New function. (resolve_oacc_loop_blocks): Use "oacc_is_compute_construct" instead of "oacc_is_parallel_or_serial" for checking that a loop is not orphaned. gcc/testsuite/ * gfortran.dg/goacc/orphan-reductions-2.f90: New test verifying that the error message is not emitted for non-orphaned loops. * c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++. --- gcc/fortran/openmp.c | 13 +++- .../c-c++-common/goacc/orphan-reductions-2.c | 69 +++ .../gfortran.dg/goacc/orphan-reductions-2.f90 | 58 3 files changed, 137 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 28408c4c99a..83c498112a8 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5926,9 +5926,16 @@ oacc_is_serial (gfc_code *code) } static bool -oacc_is_parallel_or_serial (gfc_code *code) +oacc_is_kernels (gfc_code *code) { - return oacc_is_parallel (code) || oacc_is_serial (code); + return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP; +} + +static bool +oacc_is_compute_construct (gfc_code *code) +{ + return oacc_is_parallel (code) || oacc_is_serial (code) +|| oacc_is_kernels (code); } static gfc_statement @@ -6222,7 +6229,7 @@ resolve_oacc_loop_blocks (gfc_code *code) for (c = omp_current_ctx; c; c = c->previous) if (!oacc_is_loop (c->code)) break; - if (c == NULL || !oacc_is_parallel_or_serial (c->code)) + if (c == NULL || !oacc_is_compute_construct (c->code)) gfc_error ("gang reduction on an orphan loop at %L", &code->loc); } diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c new file mode 100644 index 000..2b651fd2b9f --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c @@ -0,0 +1,69 @@ +/* Verify that the error message for gang reduction on orphaned OpenACC loops + is not reported for non-orphaned loops. */ + +#include + +int +kernels (int n) +{ + int i, s1 = 0, s2 = 0; +#pragma acc kernels + { +#pragma acc loop gang reduction(+:s1) /* { dg-bogus "gang reduction on an orphan loop" } */ + for (i = 0; i < n; i++) +s1 = s1 + 2; + +#pragma acc loop gang reduction(+:s2) /* { dg-bogus "gang reduction on an orphan loop" } */ + for (i = 0; i < n; i++) +s2 = s2 + 2; + } +
Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message
Thomas Schwinge writes: Hi Thomas, > (CC added, for everything touching gfortran.) Thanks! > On 2020-07-07T10:52:08+0200, Frederik Harwath > wrote: >> This patch fixes the check for reductions on orphaned gang loops > > This is the "Make OpenACC orphan gang reductions errors" functionality > originally added in gomp-4_0-branch r247461. > >> the Fortran frontend which (in contrast to the C, C++ frontends) >> erroneously rejects reductions on gang loops that are contained in >> "kernels" constructs and which hence are not orphaned. >> >> According to the OpenACC standard version 2.5 and later, reductions on >> orphaned gang loops are explicitly disallowed (cf. section "Changes >> from Version 2.0 to 2.5"). Remember that a loop is "orphaned" if it is >> not lexically contained in a compute construct (cf. section "Loop >> construct" of the OpenACC standard), i.e. in either a "parallel", a >> "serial", or a "kernels" construct. > > Or the other way round: a 'loop' construct is orphaned if it appears > inside a 'routine' region, right? The "not lexically contained in a compute construct" definition is from the standard. Assuming that the frontend's parser rejects "loop" directives if they do not occur inside of either the "serial", "parallel", "kernels" compute constructs or in a function with a "routine" directive, both definitions should be indeed equivalent ;-). > Unless Julian/Kwok speak up soon: OK, thanks. > > Reviewed-by: Thomas Schwinge > > May want to remove "libgomp" from the first line of the commit log -- > this commit doesn't relate to libgomp specifically. Right. > (Ideally, we'd also test 'serial' construct in addition to 'kernels', > 'parallel', but we can add that later. I anyway have a WIP patch > waiting, adding more 'serial' construct testing, for a different reason, > so I'll include it there.) I had left this out intentionally, because having the gang reduction in the serial construct leads to a "region contains gang partitioned code but is not gang partitioned" error. Of course, we might still add a test case with that expectation. Thanks for the review! Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Re: [PATCH] [og10] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message
Thomas Schwinge writes: Hi Thomas, >> Can I include the patch in OG10? > > Unless Julian/Kwok speak up soon: OK, thanks. This has been delayed a bit by my vacation, but I have now committed the patch. > May want to remove "libgomp" from the first line of the commit log -- > this commit doesn't relate to libgomp specifically. > > (Ideally, we'd also test 'serial' construct in addition to 'kernels', > 'parallel', but we can add that later. I anyway have a WIP patch > waiting, adding more 'serial' construct testing, for a different reason, > so I'll include it there.) I forgot to remove "libgomp" from the commit message, sorry, but I have included the test cases for the "serial construct". Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 7c10ae450b95495dda362cb66770bb78b546592e Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 20 Jul 2020 11:24:21 +0200 Subject: [PATCH] libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message According to the OpenACC standard version 2.5 and later, reductions on orphaned gang loops are explicitly disallowed (cf. section "Changes from Version 2.0 to 2.5"). A loop is "orphaned" if it is not lexically contained in a compute construct (cf. section "Loop construct" of the OpenACC standard), i.e. in either a "parallel", a "serial", or a "kernels" construct. This commit fixes the check for reductions on orphaned gang loops in the Fortran frontend which (in contrast to the C, C++ frontends) erroneously rejects reductions on gang loops that are contained in "kernels" constructs. 2020-07-20 Frederik Harwath gcc/fortran/ * openmp.c (oacc_is_parallel_or_serial): Removed function. (oacc_is_kernels): New function. (oacc_is_compute_construct): New function. (resolve_oacc_loop_blocks): Use "oacc_is_compute_construct" instead of "oacc_is_parallel_or_serial" for checking that a loop is not orphaned. gcc/testsuite/ * gfortran.dg/goacc/orphan-reductions-2.f90: New test verifying that the "gang reduction on an orphan loop" error message is not emitted for non-orphaned loops. * c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++. --- gcc/fortran/ChangeLog | 9 ++ gcc/fortran/openmp.c | 13 ++- gcc/testsuite/ChangeLog | 7 ++ .../c-c++-common/goacc/orphan-reductions-2.c | 103 ++ .../gfortran.dg/goacc/orphan-reductions-2.f90 | 87 +++ 5 files changed, 216 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index e86279cb647..5a1f81c286e 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,12 @@ +2020-07-20 Frederik Harwath + + * openmp.c (oacc_is_parallel_or_serial): Removed function. + (oacc_is_kernels): New function. + (oacc_is_compute_construct): New function. + (resolve_oacc_loop_blocks): Use "oacc_is_compute_construct" + instead of "oacc_is_parallel_or_serial" for checking that a + loop is not orphaned. + 2020-07-08 Harald Anlauf Backported from master: diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index ab68e9f2173..706933c869a 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5927,9 +5927,16 @@ oacc_is_serial (gfc_code *code) } static bool -oacc_is_parallel_or_serial (gfc_code *code) +oacc_is_kernels (gfc_code *code) { - return oacc_is_parallel (code) || oacc_is_serial (code); + return code->op == EXEC_OACC_KERNELS || code->op == EXEC_OACC_KERNELS_LOOP; +} + +static bool +oacc_is_compute_construct (gfc_code *code) +{ + return oacc_is_parallel (code) || oacc_is_serial (code) +|| oacc_is_kernels (code); } static gfc_statement @@ -6223,7 +6230,7 @@ resolve_oacc_loop_blocks (gfc_code *code) for (c = omp_current_ctx; c; c = c->previous) if (!oacc_is_loop (c->code)) break; - if (c == NULL || !oacc_is_parallel_or_serial (c->code)) + if (c == NULL || !oacc_is_compute_construct (c->code)) gfc_error ("gang reduction on an orphan loop at %L", &code->loc); } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 59e6c93b07a..fa1937a4ea2 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,10 @@ +2020-07-20 Frederik Harwath + + * gfortran.dg/goacc/orphan-reductions-2.f90: New test + verifying that the "gang reduction on an orphan loop&quo
Re: [og9] Really fix og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof"
Hi Thomas, Thomas Schwinge writes: > On 2020-03-25T18:09:25+0100, I wrote: >> On 2018-02-22T12:23:25+0100, Tom de Vries wrote: >>> when using cuda 9 nvprof with an openacc executable, the executable hangs. > >> What Frederik has discovered today in the hard way... [...] >> -- the hang was back. [...] > ..., and now the attached patch to devel/omp/gcc-9 in commit > 775f1686a3df68bd20370f1fabc6273883e2c5d2 'Really fix og9 "Fix hang when > running oacc exec with CUDA 9.0 nvprof"'. Thanks for fixing this issue! I can confirm that nvprof now works on code compiled from devel/omp/gcc-9. I have used nvprof 9.1.85 on Ubuntu 18.04 for testing. Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Re: [og8] Report errors on missing OpenACC reduction clauses in nested reductions
Thomas Schwinge writes: Hi Thomas, > Via <https://gcc.gnu.org/PR94629> "10 issues located by the PVS-studio > static analyzer" (so please reference that one on any patch submission), > on <https://habr.com/en/company/pvs-studio/blog/497640/> in "Fragment N3, > Assigning a variable to itself", we find this latter assignment qualified > as "very strange to assign a variable to itself". > > Probably that should've been 'outer_ctx' instead of 'ctx'? I agree that the original intention must have been to assign the outer_ctx's "outer_reduction_clauses" to the corresponding field of the inner "ctx". This would make sense, semantically. But this field is meant to be used by the function "scan_omp_for" only and ... > then does the current algorith still work despite this error? ... this function never requires the struct field to be intialized in that way. Before the field is used, it always copies the clauses from the outer context's outer_reduction_clauses to ctx->outer_reduction_clauses: >> + if (ctx->outer_reduction_clauses == NULL && ctx->outer != NULL) >> +ctx->outer_reduction_clauses >> + = chainon (unshare_expr (ctx->outer->local_reduction_clauses), >> + ctx->outer->outer_reduction_clauses); Hence I found it preferrable to remove the assignment to the "outer_reduction_clauses" field and the "local_reduction_clauses" field from "new_omp_context" completely. (The fields are still zero intialized by the allocation of the struct which uses XCNEW.) That way the whole logic regarding the fields is now contained in "scan_omp_for". I have executed "make check" (on x86_64-linux-gnu) to verify that the change causes no regressions. Ok to push the commit to master? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 2d60b374a44b212ff97c8b1fd6f8c39e478dc70f Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 21 Apr 2020 12:36:14 +0200 Subject: [PATCH] Remove fishy self-assignment in omp-low.c [PR94629] The PR noticed that omp-low.c contains a self-assignment in the function new_omp_context: if (outer_ctx) { ... ctx->outer_reduction_clauses = ctx->outer_reduction_clauses; This is obviously useless. The original intention might have been to copy the field from the outer_ctx to ctx. Since this is done (properly) in the only function where this field is actually used (in function scan_omp_for) and the field is being initialized to zero during the struct allocation, there is no need to attempt to do anything to this field in new_omp_context. Thus this commit removes any assignment to the field from new_omp_context. 2020-04-21 Frederik Harwath PR other/94629 * gcc/omp-low.c (new_omp_context): Remove assignments to ctx->outer_reduction_clauses and ctx->local_reduction_clauses. --- gcc/omp-low.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 67565d61400..88f23e60d34 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -128,10 +128,16 @@ struct omp_context corresponding tracking loop iteration variables. */ hash_map *lastprivate_conditional_map; - /* A tree_list of the reduction clauses in this context. */ + /* A tree_list of the reduction clauses in this context. This is +only used for checking the consistency of OpenACC reduction +clauses in scan_omp_for and is not guaranteed to contain a valid +value outside of this function. */ tree local_reduction_clauses; - /* A tree_list of the reduction clauses in outer contexts. */ + /* A tree_list of the reduction clauses in outer contexts. This is +only used for checking the consistency of OpenACC reduction +clauses in scan_omp_for and is not guaranteed to contain a valid +value outside of this function. */ tree outer_reduction_clauses; /* Nesting depth of this context. Used to beautify error messages re @@ -931,8 +937,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->outer = outer_ctx; ctx->cb = outer_ctx->cb; ctx->cb.block = NULL; - ctx->local_reduction_clauses = NULL; - ctx->outer_reduction_clauses = ctx->outer_reduction_clauses; ctx->depth = outer_ctx->depth + 1; } else @@ -948,8 +952,6 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb.transform_call_graph_edges = CB_CGE_MOVE; ctx->cb.adjust_array_error_bounds = true; ctx->cb.dont_remap_vla_if_no_change = true; - ctx->local_reduction_clauses = NULL; - ctx->outer_reduction_clauses = NULL; ctx->depth = 1; } -- 2.17.1
[PATCH] libgomp_g.h: Include stdint.h instead of gstdint.h
Hi, I am a new member of Mentor's Sourcery Tools Services group and this is the first patch that I am submitting here. I do not have write access to the svn repository yet, hence someone would have to merge this patch for me if it gets accepted. But I intend to apply for an account soon. The patch changes libgomp/libgomp_g.h to include stdint.h instead of the internal gstdint.h. The inclusion of gstdint.h has been introduced by GCC trunk r265930, presumably because this introduced uses of uintptr_t. Since gstdint.h is not part of GCC's installation, several libgomp test cases fail to compile when running the tests with the installed GCC. I have tested the patch with "make check" on x86_64 GNU/Linux. Best regards, Frederik libgomp/ChangeLog: 2019-09-25 Kwok Cheung Yeung * libgomp_g.h: Include stdint.h instead of gstdint.h. diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h index 32a9d8a..dfb55fb 100644 --- a/libgomp/libgomp_g.h +++ b/libgomp/libgomp_g.h @@ -31,7 +31,7 @@ #include #include -#include "gstdint.h" +#include /* barrier.c */
Re: [PATCH] libgomp_g.h: Include stdint.h instead of gstdint.h
Hi Jakub, Am 30.09.2019 um 09:25 schrieb Jakub Jelinek: > On Mon, Sep 30, 2019 at 12:03:00AM -0700, Frederik Harwath wrote: >> The patch changes libgomp/libgomp_g.h to include stdint.h instead of the >> internal gstdint.h. [...] > > That looks wrong, will make libgomp less portable. [...] > Jakub We have discussed this issue with Joseph Myers. Let me quote what Joseph wrote: "I think including is appropriate (and, more generally, removing the special configure support for GCC_HEADER_STDINT for anything built only for the target - note that libgcc/gstdint.h has a comment saying it's about libdecnumber portability to *hosts*, not targets, without stdint.h). On any target without stdint.h, GCC should be providing its own; the only targets where GCC does not yet know about target stdint.h types are SymbianOS, LynxOS, QNX, TPF (see GCC bug 448), and I think it's pretty unlikely libgomp would do anything useful for those (and if in fact they do provide stdint.h, there wouldn't be an issue anyway)." Hence, I think the change will not affect portability negatively. Best regards, Frederik
Add myself to MAINTAINERS files
2019-10-01 Frederik Harwath * MAINTAINERS: Add myself to Write After Approval Index: ChangeLog === --- ChangeLog (revision 276390) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2019-10-01 Frederik Harwath + + * MAINTAINERS: Add myself to Write After Approval + 2019-09-26 Richard Sandiford * MAINTAINERS: Add myself as an aarch64 maintainer. Index: MAINTAINERS === --- MAINTAINERS (revision 276390) +++ MAINTAINERS (working copy) @@ -409,6 +409,7 @@ Wei Guozhi Mostafa Hagog Andrew Haley +Frederik Harwath Stuart Hastings Michael Haubenwallner Pat Haugen
[PATCH] Report errors on inconsistent OpenACC nested reduction, clauses
Hi, OpenACC requires that, if a variable is used in reduction clauses on two nested loops, then there must be reduction clauses for that variable on all loops that are nested in between the two loops and all these reduction clauses must use the same operator; this has been first clarified by OpenACC 2.6. This commit introduces a check for that property which reports errors if the property is violated. I have tested the patch by comparing "make check" results and I am not aware of any regressions. Gergö has implemented the check and it works, but I was wondering if the way in which the patch avoids issuing errors about operator switches more than once by modifying the clauses (cf. the corresponding comment in omp-low.c) could lead to problems - the processing might still continue after the error on the modified tree, right? I was also wondering about the best place for such checks. Should this be a part of "pass_lower_omp" (as in the patch) or should it run earlier like, for instance, "pass_diagnose_omp_blocks". Can the patch be included in trunk? Frederik >From 99796969c1bf91048c6383dfb1b8576bdd9efd7d Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Mon, 21 Oct 2019 08:27:58 +0200 Subject: [PATCH] Report errors on inconsistent OpenACC nested reduction clauses MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OpenACC (cf. OpenACC 2.7, section 2.9.11. "reduction clause"; this was first clarified by OpenACC 2.6) requires that, if a variable is used in reduction clauses on two nested loops, then there must be reduction clauses for that variable on all loops that are nested in between the two loops and all these reduction clauses must use the same operator. This commit introduces a check for that property which reports errors if it is violated. In gcc/testsuite/c-c++-common/goacc/reduction-6.c, we remove the erroneous reductions on variable b; adding a reduction clause to make it compile cleanly would make it a duplicate of the test for variable c. 2010-10-21 Gergö Barany Frederik Harwath gcc/ * omp-low.c (struct omp_context): New fields local_reduction_clauses, outer_reduction_clauses. (new_omp_context): Initialize these. (scan_sharing_clauses): Record reduction clauses on OpenACC constructs. (scan_omp_for): Check reduction clauses for incorrect nesting. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-fail.c: New test. * c-c++-common/goacc/nested-reductions.c: New test. * c-c++-common/goacc/reduction-6.c: Adjust. libgomp/ * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: Add missing reduction clauses. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Likewise. --- gcc/omp-low.c | 107 +++- .../goacc/nested-reductions-fail.c| 492 ++ .../c-c++-common/goacc/nested-reductions.c| 420 +++ .../c-c++-common/goacc/reduction-6.c | 11 - .../par-loop-comb-reduction-1.c | 2 +- .../par-loop-comb-reduction-2.c | 2 +- .../par-loop-comb-reduction-3.c | 2 +- .../par-loop-comb-reduction-4.c | 2 +- 8 files changed, 1022 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 279b6ef893a..a2212274685 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -127,6 +127,12 @@ struct omp_context corresponding tracking loop iteration variables. */ hash_map *lastprivate_conditional_map; + /* A tree_list of the reduction clauses in this context. */ + tree local_reduction_clauses; + + /* A tree_list of the reduction clauses in outer contexts. */ + tree outer_reduction_clauses; + /* Nesting depth of this context. Used to beautify error messages re invalid gotos. The outermost ctx is depth 1, with depth 0 being reserved for the main body of the function. */ @@ -902,6 +908,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb = outer_ctx->cb; ctx->cb.block = NULL; ctx->depth = outer_ctx->depth + 1; + ctx->local_reduction_clauses = NULL; + ctx->outer_reduction_clauses = ctx->outer_reduction_clauses; } else { @@ -917,6 +925,8 @@ new_omp_context (gimple *stmt, omp_context *outer_ctx) ctx->cb.adjust_array_error_bounds = true; ctx->cb.dont_remap_vla_if_no_change = true; ctx->depth = 1; + ctx->local_reduction_clauses = NULL; + ctx->outer_reduction_clauses = NULL;
testsuite: clarify scan-dump file globbing behavior
Hi, The test commands for scanning optimization dump files perform globbing on the argument that specifies the suffix of the dump files to be scanned. This behavior is currently undocumented. Furthermore, the current implementation of "scan-dump" and related procedures yields an error whenever the file name globbing matches more than one file (due to an attempt to call "open" on multiple files) while a failure to match any file at all results in an unresolved test. This patch documents the globbing behavior. The dump scanning procedures are changed to make the test unresolved if globbing matches more than one file. The procedures in scandump.exp all perform the file name expansion in essentially the same way and I have extracted this into a new procedure. But there is one very minor exception: > @@ -67,10 +95,10 @@ proc scan-dump { args } { > set dumpbase [dump-base $src [lindex $args 3]] > -set output_file "[glob -nocomplain $dumpbase.[lindex $args 2]]" > + > +set pattern "$dumpbase.[lindex $args 2]" > +set output_file "[glob-dump-file $testcase $pattern]" > if { $output_file == "" } { > - verbose -log "$testcase: dump file does not exist" > - verbose -log "dump file: $dumpbase.$suf" "scan-dump" is the only procedure that prints the "dump file: ..." line. Should this be kept or is it ok to remove this as I have done in the patch? $dumpbase.$suf does not emit the correct file name anyway (a random example from my testing: "dump file: stdatomic-init.c.dce*") and the name of the files can be inferred from the test name easily. I have tested the changes by running "make check" (with a --enable-languages=C only build, but this covers lots of uses of the affected test procedures) and observed no regressions. Ok to commit this to master? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 6912e03d51d360dbbcf7eb1dc8d77d08c2a6e54c Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Fri, 15 May 2020 10:35:48 +0200 Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior The test commands for scanning optimization dump files perform globbing on the argument that specifies the suffix of the dump files to be scanned. This behavior is currently undocumented. Furthermore, the current implementation of "scan-dump" and similar procedures yields an error whenever the file name globbing matches more than one file (due to an attempt to call "open" on multiple files) while a failure to match any file results in an unresolved test. This commit documents the globbing behavior. The dump scanning procedures are changed to make the test unresolved if globbing matches more than one file. gcc/ChangeLog: 2020-05-15 Frederik Harwath * doc/sourcebuild.texi: Describe globbing of the dump file scanning commands "suffix" argument. gcc/testsuite/ChangeLog: 2020-05-15 Frederik Harwath * lib/scandump.exp (glob-dump-file): New proc. (scan-dump): Use glob-dump-file for file name expansion. (scan-dump-times): Likewise. (scan-dump-dem): Likewise. (scan-dump-dem-not): Likewise. --- gcc/doc/sourcebuild.texi | 4 ++- gcc/testsuite/lib/scandump.exp | 54 +++--- 2 files changed, 46 insertions(+), 12 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 240d6e4b08e..b6c5a21cb71 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2888,7 +2888,9 @@ stands for zero or more unmatched lines; the whitespace after These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree}, @code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa}, and -@code{wpa-ipa}. +@code{wpa-ipa}. The @var{suffix} argument which describes the dump file +to be scanned may contain a glob pattern that must expand to exactly one +file name. @table @code @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail @var{selector} @}] diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp index d6ba350acc8..f3a991b590a 100644 --- a/gcc/testsuite/lib/scandump.exp +++ b/gcc/testsuite/lib/scandump.exp @@ -39,6 +39,34 @@ proc dump-base { args } { return $dumpbase } +# Expand dump file name pattern to exactly one file. +# Return a single dump file name or an empty string +# if the pattern matches no file or more than one file. +# +# Argument 0 is the testcase name +# Argument 1 is the dump file glob pattern +proc glob-dump-file { args } { + +set pattern [lindex $args 1] +set dump_file "[glob -nocomplain $pattern]" +set num_files [llength $dump_file] + +if { $num_files != 1 } { + set testcase
Re: testsuite: clarify scan-dump file globbing behavior
Hi Thomas, Thomas Schwinge writes: > I can't formally approve testsuite patches, but did a review anyway: Thanks for the review! > On 2020-05-15T12:31:54+0200, Frederik Harwath > wrote: >> The dump >> scanning procedures are changed to make the test unresolved >> if globbing matches more than one file. > > (The code changes look good, but I have not tested that specific aspect.) We do not have automated tests for the testsuite commands :-), but I have of course tested this manually. > As I said, not an approval, and minor comments (see below), but still: > > Reviewed-by: Thomas Schwinge > > Do we have to similarly also audit/alter other testsuite infrastructure > files, anything that uses '[glob [...]]'? (..., and then generalize > 'glob-dump-file' into 'glob-one-file', or similar.) That can be done > incrementally, as far as I'm concerned. I also think it would make sense to adapt similar test commands as well. > May also make this more useful/explicit: > > This is useful if, for example, if a pass has several static > instances [correct terminology?], and depending on torture testing > command-line flags, a different instance executes and produces a dump > file, and so in the test case you can use a generic [put example > here] to scan the varying dump files names. > > (Or similar.) I have moved the explanation below the description of the individual commands and added an example. See the attached revised patch. Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Fri, 15 May 2020 10:35:48 +0200 Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior The test commands for scanning optimization dump files perform globbing on the argument that specifies the suffix of the dump files to be scanned. This behavior is currently undocumented. Furthermore, the current implementation of "scan-dump" and similar procedures yields an error whenever the file name globbing matches more than one file (due to an attempt to call "open" on multiple files) while a failure to match any file results in an unresolved test. This commit documents the globbing behavior. The dump scanning procedures are changed to make the test unresolved if globbing matches more than one file. gcc/ChangeLog: 2020-05-19 Frederik Harwath * doc/sourcebuild.texi: Describe globbing of the dump file scanning commands "suffix" argument. gcc/testsuite/ChangeLog: 2020-05-19 Frederik Harwath * lib/scandump.exp (glob-dump-file): New proc. (scan-dump): Use glob-dump-file for file name expansion. (scan-dump-times): Likewise. (scan-dump-dem): Likewise. (scan-dump-dem-not): Likewise. Reviewed-by: Thomas Schwinge --- gcc/doc/sourcebuild.texi | 13 gcc/testsuite/lib/scandump.exp | 54 +++--- 2 files changed, 56 insertions(+), 11 deletions(-) diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 240d6e4b08e..9df4b06d460 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in the dump file with suffix @var{suffix}. @end table +The @var{suffix} argument which describes the dump file to be scanned +may contain a glob pattern that must expand to exactly one file +name. This is useful if, e.g., different pass instances are executed +depending on torture testing command-line flags, producing dump files +whose names differ only in their pass instance number suffix. For +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for +occurrences of the string ``code has been optimized'', use: +@smallexample +/* @{ dg-options "-fdump-tree-mypass" @} */ +/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" @} @} */ +@end smallexample + + @subsubsection Check for output files @table @code diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp index d6ba350acc8..f3a991b590a 100644 --- a/gcc/testsuite/lib/scandump.exp +++ b/gcc/testsuite/lib/scandump.exp @@ -39,6 +39,34 @@ proc dump-base { args } { return $dumpbase } +# Expand dump file name pattern to exactly one file. +# Return a single dump file name or an empty string +# if the pattern matches no file or more than one file. +# +# Argument 0 is the testcase name +# Argument 1 is the dump file glob pattern +proc glob-dump-file { args } { + +set pattern [lindex $args 1] +set dump_file "[glob -nocomplain $pattern]" +set num_files [llength $dump_file] + +
[PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}
Hi, the new contrib/gcc-changelog/git_check_commit.py script (which, by the way, is very useful!) does not handle "Reviewed-by" and "Reviewed-on" lines yet and hence it expects those lines to be indented by a tab although those lines are usually not indented. The script already knows about "Co-Authored-By" lines and I have extended it to handle the "Reviewed-{by,on}" lines in a similar way. The information from those lines is not processed further since the review information apparantly does not get included in the ChangeLogs. Ok to commit the patch? Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter >From 0dc9b201bc1607de36cb9b3604a87cc3646292e3 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 19 May 2020 11:15:28 +0200 Subject: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on} git-check-commit.py does not know about "Reviewed-by" and "Reviewed-on" lines and hence it expects those lines which follow the ChangeLog entries to be indented by a tab. This commit makes the script skip those lines. No further processing is attempted because the review information is not part of the ChangeLogs. contrib/ 2020-05-19 Frederik Harwath * gcc-changelog/git_commit.py: Skip over lines starting with "Reviewed-by: " or "Reviewed-on: ". --- contrib/gcc-changelog/git_commit.py | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index 5214cc36538..ebcf853f02f 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -150,6 +150,8 @@ star_prefix_regex = re.compile(r'\t\*(?P\ *)(?P.*)') LINE_LIMIT = 100 TAB_WIDTH = 8 CO_AUTHORED_BY_PREFIX = 'co-authored-by: ' +REVIEWED_BY_PREFIX = 'reviewed-by: ' +REVIEWED_ON_PREFIX = 'reviewed-on: ' class Error: @@ -344,12 +346,19 @@ class GitCommit: else: pr_line = line.lstrip() -if line.lower().startswith(CO_AUTHORED_BY_PREFIX): +lowered_line = line.lower() +if lowered_line.startswith(CO_AUTHORED_BY_PREFIX): name = line[len(CO_AUTHORED_BY_PREFIX):] author = self.format_git_author(name) self.co_authors.append(author) continue +# Skip over review information for now. +# This avoids errors due to missing tabs on these lines below. +if lowered_line.startswith((REVIEWED_BY_PREFIX,\ +REVIEWED_ON_PREFIX)): +continue + # ChangeLog name will be deduced later if not last_entry: if author_tuple: -- 2.17.1
Re: [PATCH] contrib/gcc-changelog: Handle Reviewed-{by,on}
Martin Liška writes: Hi Martin, > On 5/19/20 11:45 AM, Frederik Harwath wrote: > Thank you Frederick for the patch. > > Looking at what I grepped: > https://github.com/marxin/gcc-changelog/issues/1#issuecomment-621910248 I get a 404 error when I try to access this URL. The repository also does not seem to be in your list of public repositories. > Can you also add 'Signed-off-by'? And please create a list with these > exceptions at the beginning of the script. Yes, I will add it. > Fine with that. Best regards, Frederik - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Re: testsuite: clarify scan-dump file globbing behavior
Frederik Harwath writes: Hi Rainer, hi Mike, ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html Best regards, Frederik > Hi Thomas, > > Thomas Schwinge writes: > >> I can't formally approve testsuite patches, but did a review anyway: > > Thanks for the review! > >> On 2020-05-15T12:31:54+0200, Frederik Harwath >> wrote: > >>> The dump >>> scanning procedures are changed to make the test unresolved >>> if globbing matches more than one file. >> >> (The code changes look good, but I have not tested that specific aspect.) > > We do not have automated tests for the testsuite commands :-), but I > have of course tested this manually. > >> As I said, not an approval, and minor comments (see below), but still: >> >> Reviewed-by: Thomas Schwinge >> >> Do we have to similarly also audit/alter other testsuite infrastructure >> files, anything that uses '[glob [...]]'? (..., and then generalize >> 'glob-dump-file' into 'glob-one-file', or similar.) That can be done >> incrementally, as far as I'm concerned. > > I also think it would make sense to adapt similar test commands as well. > >> May also make this more useful/explicit: >> >> This is useful if, for example, if a pass has several static >> instances [correct terminology?], and depending on torture testing >> command-line flags, a different instance executes and produces a dump >> file, and so in the test case you can use a generic [put example >> here] to scan the varying dump files names. >> >> (Or similar.) > > I have moved the explanation below the description of the individual > commands and added an example. See the attached revised patch. > > Best regards, > Frederik > > From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001 > From: Frederik Harwath > Date: Fri, 15 May 2020 10:35:48 +0200 > Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior > > The test commands for scanning optimization dump files > perform globbing on the argument that specifies the suffix > of the dump files to be scanned. This behavior is currently > undocumented. Furthermore, the current implementation of > "scan-dump" and similar procedures yields an error whenever > the file name globbing matches more than one file (due to an > attempt to call "open" on multiple files) while a failure to > match any file results in an unresolved test. > > This commit documents the globbing behavior. The dump > scanning procedures are changed to make the test unresolved > if globbing matches more than one file. > > gcc/ChangeLog: > > 2020-05-19 Frederik Harwath > > * doc/sourcebuild.texi: Describe globbing of the > dump file scanning commands "suffix" argument. > > gcc/testsuite/ChangeLog: > > 2020-05-19 Frederik Harwath > > * lib/scandump.exp (glob-dump-file): New proc. > (scan-dump): Use glob-dump-file for file name expansion. > (scan-dump-times): Likewise. > (scan-dump-dem): Likewise. > (scan-dump-dem-not): Likewise. > > Reviewed-by: Thomas Schwinge > --- > gcc/doc/sourcebuild.texi | 13 > gcc/testsuite/lib/scandump.exp | 54 +++--- > 2 files changed, 56 insertions(+), 11 deletions(-) > > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index 240d6e4b08e..9df4b06d460 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text in > the dump file with > suffix @var{suffix}. > @end table > > +The @var{suffix} argument which describes the dump file to be scanned > +may contain a glob pattern that must expand to exactly one file > +name. This is useful if, e.g., different pass instances are executed > +depending on torture testing command-line flags, producing dump files > +whose names differ only in their pass instance number suffix. For > +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for > +occurrences of the string ``code has been optimized'', use: > +@smallexample > +/* @{ dg-options "-fdump-tree-mypass" @} */ > +/* @{ dg-final @{ scan-tree-dump "code has been optimized" "mypass\[1-3\]" > @} @} */ > +@end smallexample > + > + > @subsubsection Check for output files > > @table @code > diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp > index d6ba350acc8..f3a991b590a 100644 > --- a/gcc/testsuite/lib/scandump.exp
PING Re: testsuite: clarify scan-dump file globbing behavior
Frederik Harwath writes: ping :-) > Frederik Harwath writes: > > Hi Rainer, hi Mike, > ping: https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545803.html > > Best regards, > Frederik > >> Hi Thomas, >> >> Thomas Schwinge writes: >> >>> I can't formally approve testsuite patches, but did a review anyway: >> >> Thanks for the review! >> >>> On 2020-05-15T12:31:54+0200, Frederik Harwath >>> wrote: >> >>>> The dump >>>> scanning procedures are changed to make the test unresolved >>>> if globbing matches more than one file. >>> >>> (The code changes look good, but I have not tested that specific aspect.) >> >> We do not have automated tests for the testsuite commands :-), but I >> have of course tested this manually. >> >>> As I said, not an approval, and minor comments (see below), but still: >>> >>> Reviewed-by: Thomas Schwinge >>> >>> Do we have to similarly also audit/alter other testsuite infrastructure >>> files, anything that uses '[glob [...]]'? (..., and then generalize >>> 'glob-dump-file' into 'glob-one-file', or similar.) That can be done >>> incrementally, as far as I'm concerned. >> >> I also think it would make sense to adapt similar test commands as well. >> >>> May also make this more useful/explicit: >>> >>> This is useful if, for example, if a pass has several static >>> instances [correct terminology?], and depending on torture testing >>> command-line flags, a different instance executes and produces a dump >>> file, and so in the test case you can use a generic [put example >>> here] to scan the varying dump files names. >>> >>> (Or similar.) >> >> I have moved the explanation below the description of the individual >> commands and added an example. See the attached revised patch. >> >> Best regards, >> Frederik >> >> From 2a17749d6dbcac690d698323240438722d6119ef Mon Sep 17 00:00:00 2001 >> From: Frederik Harwath >> Date: Fri, 15 May 2020 10:35:48 +0200 >> Subject: [PATCH] testsuite: clarify scan-dump file globbing behavior >> >> The test commands for scanning optimization dump files >> perform globbing on the argument that specifies the suffix >> of the dump files to be scanned. This behavior is currently >> undocumented. Furthermore, the current implementation of >> "scan-dump" and similar procedures yields an error whenever >> the file name globbing matches more than one file (due to an >> attempt to call "open" on multiple files) while a failure to >> match any file results in an unresolved test. >> >> This commit documents the globbing behavior. The dump >> scanning procedures are changed to make the test unresolved >> if globbing matches more than one file. >> >> gcc/ChangeLog: >> >> 2020-05-19 Frederik Harwath >> >> * doc/sourcebuild.texi: Describe globbing of the >> dump file scanning commands "suffix" argument. >> >> gcc/testsuite/ChangeLog: >> >> 2020-05-19 Frederik Harwath >> >> * lib/scandump.exp (glob-dump-file): New proc. >> (scan-dump): Use glob-dump-file for file name expansion. >> (scan-dump-times): Likewise. >> (scan-dump-dem): Likewise. >> (scan-dump-dem-not): Likewise. >> >> Reviewed-by: Thomas Schwinge >> --- >> gcc/doc/sourcebuild.texi | 13 >> gcc/testsuite/lib/scandump.exp | 54 +++--- >> 2 files changed, 56 insertions(+), 11 deletions(-) >> >> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi >> index 240d6e4b08e..9df4b06d460 100644 >> --- a/gcc/doc/sourcebuild.texi >> +++ b/gcc/doc/sourcebuild.texi >> @@ -2911,6 +2911,19 @@ Passes if @var{regex} does not match demangled text >> in the dump file with >> suffix @var{suffix}. >> @end table >> >> +The @var{suffix} argument which describes the dump file to be scanned >> +may contain a glob pattern that must expand to exactly one file >> +name. This is useful if, e.g., different pass instances are executed >> +depending on torture testing command-line flags, producing dump files >> +whose names differ only in their pass instance number suffix. For >> +example, to scan instances 1, 2, 3 of a tree pass ``mypass'' for >> +occurrences of the string ``code has been optimized'
[PATCH] Add OpenACC 2.6 `acc_get_property' support
Hi, this patch implements OpenACC 2.6 "acc_get_property" and related functions. I have tested the patch on x86_64-linux-gnu with nvptx-none offloading. There is no AMD GCN support yet. This will be added later on. Can this be committed to trunk? Best regards, Frederik --- 8< --- Add generic support for the OpenACC 2.6 `acc_get_property' and `acc_get_property_string' routines, as well as full handlers for the host and the NVPTX offload targets and minimal handlers for the HSA and Intel MIC offload targets. Included are C/C++ and Fortran tests that, in particular, print the property values for acc_property_vendor, acc_property_memory, acc_property_free_memory, acc_property_name, and acc_property_driver. The output looks as follows: Vendor: GNU Name: GOMP Total memory: 0 Free memory: 0 Driver: 1.0 with the host driver (where the memory related properties are not supported for the host device and yield 0, conforming to the standard) and output like: OpenACC vendor: Nvidia OpenACC total memory: 12651462656 OpenACC free memory: 12202737664 OpenACC name: TITAN V OpenACC driver: CUDA Driver 9.1 with the NVPTX driver. 2019-11-14 Maciej W. Rozycki Frederik Harwath Thomas Schwinge include/ * gomp-constants.h (GOMP_DEVICE_CURRENT, GOMP_DEVICE_PROPERTY_MEMORY, GOMP_DEVICE_PROPERTY_FREE_MEMORY, GOMP_DEVICE_PROPERTY_NAME, GOMP_DEVICE_PROPERTY_VENDOR, GOMP_DEVICE_PROPERTY_DRIVER, GOMP_DEVICE_PROPERTY_STRING_MASK): New Macros. libgomp/ * libgomp.h (gomp_device_descr): Add `get_property_func' member. * libgomp-plugin.h (gomp_device_property_value): New union. (gomp_device_property_value): New prototype. * openacc.h (acc_device_t): Add `acc_device_current' enumeration constant. (acc_device_property_t): New enum. (acc_get_property, acc_get_property_string): New prototypes. * oacc-init.c (acc_get_device_type): Also assert on `!acc_device_current' result. (get_property_any, acc_get_property, acc_get_property_string): New functions. * openacc.f90 (openacc_kinds): From `iso_fortran_env' also import `int64'. Add `acc_device_current' and `acc_property_memory', `acc_property_free_memory', `acc_property_name', `acc_property_vendor' and `acc_property_driver' constants. Add `acc_device_property' data type. (openacc_internal): Add `acc_get_property' and `acc_get_property_string' interfaces. Add `acc_get_property_h', `acc_get_property_string_h', `acc_get_property_l' and `acc_get_property_string_l'. (openacc_c_string): New module. * oacc-host.c (host_get_property): New function. (host_dispatch): Wire it. * target.c (gomp_load_plugin_for_device): Handle `get_property'. * libgomp.map (OACC_2.6): Add `acc_get_property', `acc_get_property_h_', `acc_get_property_string' and `acc_get_property_string_h_' symbols. * oacc-init.c (acc_known_device_type): Add function. (unknown_device_type_error): Add function. (name_of_acc_device_t): Change to call unknown_device_type_error on unknown type. (resolve_device): Use acc_known_device_type. (acc_init): Fail if acc_device_t argument is not valid. (acc_shutdown): Likewise. (acc_get_num_devices): Likewise. (acc_set_device_type): Likewise. (acc_get_device_num): Likewise. (acc_set_device_num): Likewise. (get_property_any): Likewise. (acc_get_property): Likewise. (acc_get_property_string): Likewise. (acc_on_device): Likewise. (goacc_save_and_set_bind): Likewise. * libgomp.texi (OpenACC Runtime Library Routines): Add `acc_get_property'. (acc_get_property): New node. * plugin/plugin-hsa.c (GOMP_OFFLOAD_get_property): New function. * plugin/plugin-nvptx.c (CUDA_CALLS): Add `cuDeviceGetName', `cuDeviceTotalMem', `cuDriverGetVersion' and `cuMemGetInfo' calls. (GOMP_OFFLOAD_get_property): New function. (struct ptx_device): Add new field "name" ... (nvptx_open_device): ... and alloc and init from here. (nvptx_close_device): ... and free from here. (cuda_driver_version): Add new static variable ... (nvptx_init): ... and init from here. * testsuite/libgomp.oacc-c-c++-common/acc-get-property.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc-get-property-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/acc-get-property-3.c: New test. * testsuite/libgomp.oacc-c-c++
[PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR
Hi, currently, on trunk, the tests gcc.dg/vect/vect-cond-reduc-1.c and gcc.dg/pr68286.c fail when compiling for amdgcn-unknown-amdhsa. The reason seems to lie in the interaction of the changes that have been introduced by revision r276659 ("Allow COND_EXPR and VEC_COND_EXPR condtions to trap" by Ilya Leoshkevich) of trunk and the vectorized code that is generated for amdgcn. If the function maybe_resimplify_conditional_op from gimple-match-head.c gets called on a conditional operation without an "else" part, it makes the operation unconditional, but only if the operation cannot trap. To check this, it uses operation_could_trap_p. This ends up in a violated assertion in the latter function if maybe_resimplify_conditional_op is called on a COND_EXPR or VEC_COND_EXPR: /* This function cannot tell whether or not COND_EXPR and VEC_COND_EXPR could trap, because that depends on the respective condition op. */ gcc_assert (op != COND_EXPR && op != VEC_COND_EXPR); A related issue has been resolved by the patch that was committed as r276915 ("PR middle-end/92063" by Jakub Jelinek). In our case, the error is triggered by the simplification rule at line 3450 of gcc/match.pd: /* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons return all -1 or all 0 results. */ /* ??? We could instead convert all instances of the vec_cond to negate, but that isn't necessarily a win on its own. */ (simplify (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2))) (if (VECTOR_TYPE_P (type) && known_eq (TYPE_VECTOR_SUBPARTS (type), TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))) && (TYPE_MODE (TREE_TYPE (type)) == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1) (minus @3 (view_convert (vec_cond @0:0 (negate @1) @2) ) It seems that this rule is not invoked when compiling for x86_64 where the generated code for vect-cond-reduc-1.c does not contain anything that would match this rule. Could it be that there is no test covering this rule for commonly tested architectures? I have changed maybe_resimplify_conditional_op to check if a COND_EXPR or VEC_COND_EXPR could trap by checking whether the condition can trap using generic_expr_could_trap_p. Judging from the comment above the assertion and the code changes of r276659, it seems that this is both necessary and sufficient to verify if those expressions can trap. Does that sound reasonable and can the patch be included in trunk? The patch fixes the failing tests for me and does not cause any visible regressions in the results of "make check" which I have executed for targets amdgcn-unknown-amdhsa and x86_64-pc-linux-gnu. Best regards, Frederik 2019-11-28 Frederik Harwath gcc/ * gimple-match-head.c (maybe_resimplify_conditional_op): use generic_expr_could_trap_p to check if the condition of COND_EXPR or VEC_COND_EXPR can trap. --- gcc/gimple-match-head.c | 14 +++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 2996bade301..4da6c4d7458 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -144,9 +144,17 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op, /* Likewise if the operation would not trap. */ bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type) && TYPE_OVERFLOW_TRAPS (res_op->type)); - if (!operation_could_trap_p ((tree_code) res_op->code, - FLOAT_TYPE_P (res_op->type), - honor_trapv, res_op->op_or_null (1))) + tree_code op_code = (tree_code) res_op->code; + /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition +traps and hence we have to check this. For all other operations, we +don't need to consider the operands. */ + bool op_could_trap = op_code == COND_EXPR || op_code == VEC_COND_EXPR ? + generic_expr_could_trap_p (res_op->ops[0]) : + operation_could_trap_p ((tree_code) res_op->code, + FLOAT_TYPE_P (res_op->type), + honor_trapv, res_op->op_or_null (1)); + + if (!op_could_trap) { res_op->cond.cond = NULL_TREE; return false; -- 2.17.1
Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR
Hi Richard, On 29.11.19 13:37, Richard Biener wrote: > On Fri, Nov 29, 2019 at 1:24 PM Harwath, Frederik > wrote: > [...] >> It seems that this rule is not invoked when compiling for x86_64 where the >> generated code for vect-cond-reduc-1.c does not contain anything that would >> match this rule. Could it be that there is no test covering this rule for >> commonly tested architectures? > > This was all added for aarch64 SVE. So it looks like the outer plus > was conditional and we end up inheriting the I should have mentioned this, it was indeed a COND_ADD. > condition for the inner vec_cond. Your fix looks reasonable but is > very badly formatted. Can you instead do > > if (op_Code == cOND_EPXR || op_code == vEC_COND_EXPR) >op_could_trap = generic_expr_could_trap (..) > else > op_could_trap = operation_could_trap_p (... > Sorry, sure! Thanks, Frederik
[PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)
Hi, On 29.11.19 13:51, Harwath, Frederik wrote: >> condition for the inner vec_cond. Your fix looks reasonable but is >> very badly formatted. Can you instead do I hope the formatting looks better now. I have also removed the [amdgcn] from the subject line since the fact that this has been discovered in the context of amdgcn is not really essential. Best regards, Frederik 2019-11-29 Frederik Harwath gcc/ * gimple-match-head.c (maybe_resimplify_conditional_op): use generic_expr_could_trap_p to check if the condition of COND_EXPR or VEC_COND_EXPR can trap. --- gcc/gimple-match-head.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 2996bade301..c763a80a6d1 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op, /* Likewise if the operation would not trap. */ bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type) && TYPE_OVERFLOW_TRAPS (res_op->type)); - if (!operation_could_trap_p ((tree_code) res_op->code, - FLOAT_TYPE_P (res_op->type), - honor_trapv, res_op->op_or_null (1))) + tree_code op_code = (tree_code) res_op->code; + bool op_could_trap; + + /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition + traps and hence we have to check this. For all other operations, we + don't need to consider the operands. */ + if (op_code == COND_EXPR || op_code == VEC_COND_EXPR) + op_could_trap = generic_expr_could_trap_p (res_op->ops[0]); + else + op_could_trap = operation_could_trap_p ((tree_code) res_op->code, + FLOAT_TYPE_P (res_op->type), + honor_trapv, + res_op->op_or_null (1)); + + if (!op_could_trap) { res_op->cond.cond = NULL_TREE; return false; -- 2.17.1
Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)
Hi Jakub, On 29.11.19 14:41, Jakub Jelinek wrote: > s/use/Use/ > > [...] > > s/. /. / Right, thanks. Does that look ok for inclusion in trunk now? Best regards, Frederik 2019-11-29 Frederik Harwath gcc/ * gimple-match-head.c (maybe_resimplify_conditional_op): Use generic_expr_could_trap_p to check if the condition of COND_EXPR or VEC_COND_EXPR can trap. --- gcc/gimple-match-head.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c index 2996bade301..9010f11621e 100644 --- a/gcc/gimple-match-head.c +++ b/gcc/gimple-match-head.c @@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, gimple_match_op *res_op, /* Likewise if the operation would not trap. */ bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type) && TYPE_OVERFLOW_TRAPS (res_op->type)); - if (!operation_could_trap_p ((tree_code) res_op->code, - FLOAT_TYPE_P (res_op->type), - honor_trapv, res_op->op_or_null (1))) + tree_code op_code = (tree_code) res_op->code; + bool op_could_trap; + + /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition + traps and hence we have to check this. For all other operations, we + don't need to consider the operands. */ + if (op_code == COND_EXPR || op_code == VEC_COND_EXPR) + op_could_trap = generic_expr_could_trap_p (res_op->ops[0]); + else + op_could_trap = operation_could_trap_p ((tree_code) res_op->code, + FLOAT_TYPE_P (res_op->type), + honor_trapv, + res_op->op_or_null (1)); + + if (!op_could_trap) { res_op->cond.cond = NULL_TREE; return false; -- 2.17.1
Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR
On 29.11.19 15:46, Richard Sandiford wrote: > Thanks for doing this, looks good to me FWIW. I was seeing the same > failure for SVE but hadn't found time to look at it. Thank you all for the review. Committed as r278853. Frederik
[Patch] Rework OpenACC nested reduction clause consistency checking (was: Re: [PATCH][committed] Warn about inconsistent OpenACC nested reduction clauses)
Hi Jakub, On 08.11.19 07:41, Harwath, Frederik wrote: > On 06.11.19 14:00, Jakub Jelinek wrote: > [...] >> I'm not sure it is a good idea to use a TREE_LIST in this case, vec would be >> more natural, wouldn't it. > > Yes. > > [...] >> If gimplifier is not the right spot, then use a splay tree + vector instead? >> splay tree for the outer ones, vector for the local ones, and put into both >> the clauses, so you can compare reduction code etc. > > Sounds like a good idea. I am going to try that. Below you can find a patch that reimplements the nested reductions check using more appropriate data structures. As an additional benefit, the quality of the warnings has also improved (see description below). I have checked the patch by running the testsuite on x86_64-pc-linux-gnu. Best regards, Frederik From 94ca786172afa7dab7630d75965bf6d6f0dd24e1 Mon Sep 17 00:00:00 2001 From: Frederik Harwath Date: Tue, 3 Dec 2019 10:38:01 +0100 Subject: [PATCH] Rework OpenACC nested reduction clause consistency checking Revision 277875 of trunk introduced a consistency check for nested OpenACC reduction clauses. The implementation has two drawbacks: 1) It uses suboptimal data structures for storing information about the reduction clauses. 2) The warnings issued for *repeated* inconsistent use of reduction operators are confusing. For instance, on three nested loops that use the reduction operators +, -, + on the same variable, we obtain a warning at the switch from + to - (as desired) and another warning about the switch from - to +. It would be preferable to avoid the second warning since + is consistent with the first reduction operator. This commit attempts to fix both problems by using more appropriate data structures (splay trees and vectors instead of tree lists) for keeping track of the information about the reduction clauses. 2019-12-3 Frederik Harwath gcc/ * omp-low.c (omp_context): Removed fields local_reduction_clauses, outer_reduction_clauses; added fields oacc_reduction_clauses, oacc_reductions_stack. (oacc_reduction_clause_location): New struct. (oacc_reduction_var_occ): New struct. (new_omp_context): Adjust omp_context initialization to new fields. (delete_omp_context): Adjust omp_context deletion to new fields. (rewind_oacc_reductions_stack): New function. (check_oacc_reduction_clause): New function. (check_oacc_reduction_clauses): New function. (scan_sharing_clauses): Call check_oacc_reduction_clause for reduction clauses (this handles clauses on compute regions) if a new optional flag is enabled. (scan_omp_for): Remove old nested reduction check, call check_oacc_reduction_clauses instead. (scan_omp_target): Adapt call to scan_sharing_clauses to enable the new flag. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-warn.c: Add dg-prune-output to ignore warnings that are not relevant to the test. (acc_parallel): Stop expecting pruned warnings, adjust expected warnings to changes in omp-low.c, add checks for info messages about the location of clauses. (acc_parallel_loop): Likewise. (acc_parallel_reduction): Likewise. (acc_parallel_loop_reduction): Likewise. (acc_routine): Likewise. (acc_kernels): Likewise. * gfortran.dg/goacc/nested-reductions-warn.f90: Likewise. --- gcc/omp-low.c | 305 -- .../goacc/nested-reductions-warn.c| 81 ++--- .../goacc/nested-reductions-warn.f90 | 83 ++--- 3 files changed, 271 insertions(+), 198 deletions(-) diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 19132f76da2..ba04e7477dc 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -73,6 +73,9 @@ along with GCC; see the file COPYING3. If not see scanned for regions which are then moved to a new function, to be invoked by the thread library, or offloaded. */ + +struct oacc_reduction_var_occ; + /* Context structure. Used to store information about each parallel directive in the code. */ @@ -128,12 +131,6 @@ struct omp_context corresponding tracking loop iteration variables. */ hash_map *lastprivate_conditional_map; - /* A tree_list of the reduction clauses in this context. */ - tree local_reduction_clauses; - - /* A tree_list of the reduction clauses in outer contexts. */ - tree outer_reduction_clauses; - /* Nesting depth of this context. Used to beautify error messages re invalid gotos. The outermost ctx is depth 1, with depth 0 being reserved for the main body of the function. */ @@ -163,8 +160,52 @@ struct omp_context /* True if there is bind clause on the construct (i.e. a loop construct). */ bool loop_p; + + /* A mapping that maps a variable to information about the last OpenACC + reduction clause that used the variable above the current context. + This information is used for checking the nesting restrictions for + reduction clauses by the func
Re: [PATCH 2/4] Validate acc_device_t uses
Hi Thomas, On 03.12.19 13:14, Thomas Schwinge wrote: > You once had this patch separate, but then merged into the upstream > submission of 'acc_get_property'; let's please keep this separate. > > With changes as indicated below, please commit this to trunk [...] Ok, I have committed the patch as revision 278936. You can find the committed version in the attachment. Thank you for the review! > Generally, does usage of these functions obsolete some existing usage of > 'acc_dev_num_out_of_range'? (OK to address later.) I think it does. I am going to verify this. >> @@ -168,7 +184,7 @@ resolve_device (acc_device_t d, bool fail_is_error) >>break; >> >> default: >> - if (d > _ACC_device_hwm) >> + if (!acc_known_device_type (d)) >> { >>if (fail_is_error) >> goto unsupported_device; > > Note that this had 'd > _ACC_device_hwm', not '>=' as it now does, that > is, previously didn't reject 'd == _ACC_device_hwm' but now does -- but I > suppose this was an (minor) bug that existed before, so OK to change as > you did? Right, I do not see any reasons why it should accept ACC_device_hwm and the change did not cause any regressions. Best regards, Frederik r278937 | frederik | 2019-12-03 15:38:54 +0100 (Di, 03 Dez 2019) | 25 lines Validate acc_device_t uses Check that function arguments of type acc_device_t are valid enumeration values in all publicly visible functions from oacc-init.c. 2019-12-03 Frederik Harwath libgomp/ * oacc-init.c (acc_known_device_type): Add function. (unknown_device_type_error): Add function. (name_of_acc_device_t): Change to call unknown_device_type_error on unknown type. (resolve_device): Use acc_known_device_type. (acc_init): Fail if acc_device_t argument is not valid. (acc_shutdown): Likewise. (acc_get_num_devices): Likewise. (acc_set_device_type): Likewise. (acc_get_device_num): Likewise. (acc_set_device_num): Likewise. (acc_on_device): Add comment that argument validity is not checked. Reviewed-by: Thomas Schwinge Index: libgomp/oacc-init.c === --- libgomp/oacc-init.c (revision 278936) +++ libgomp/oacc-init.c (working copy) @@ -82,6 +82,18 @@ gomp_mutex_unlock (&acc_device_lock); } +static bool +known_device_type_p (acc_device_t d) +{ + return d >= 0 && d < _ACC_device_hwm; +} + +static void +unknown_device_type_error (acc_device_t invalid_type) +{ + gomp_fatal ("unknown device type %u", invalid_type); +} + /* OpenACC names some things a little differently. */ static const char * @@ -103,8 +115,9 @@ case acc_device_host: return "host"; case acc_device_not_host: return "not_host"; case acc_device_nvidia: return "nvidia"; -default: gomp_fatal ("unknown device type %u", (unsigned) type); +default: unknown_device_type_error (type); } + __builtin_unreachable (); } /* ACC_DEVICE_LOCK must be held before calling this function. If FAIL_IS_ERROR @@ -123,7 +136,7 @@ if (goacc_device_type) { /* Lookup the named device. */ - while (++d != _ACC_device_hwm) + while (known_device_type_p (++d)) if (dispatchers[d] && !strcasecmp (goacc_device_type, get_openacc_name (dispatchers[d]->name)) @@ -147,7 +160,7 @@ case acc_device_not_host: /* Find the first available device after acc_device_not_host. */ - while (++d != _ACC_device_hwm) + while (known_device_type_p (++d)) if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0) goto found; if (d_arg == acc_device_default) @@ -168,7 +181,7 @@ break; default: - if (d > _ACC_device_hwm) + if (!known_device_type_p (d)) { if (fail_is_error) goto unsupported_device; @@ -505,6 +518,9 @@ void acc_init (acc_device_t d) { + if (!known_device_type_p (d)) +unknown_device_type_error (d); + gomp_init_targets_once (); gomp_mutex_lock (&acc_device_lock); @@ -519,6 +535,9 @@ void acc_shutdown (acc_device_t d) { + if (!known_device_type_p (d)) +unknown_device_type_error (d); + gomp_init_targets_once (); gomp_mutex_lock (&acc_device_lock); @@ -533,6 +552,9 @@ int acc_get_num_devices (acc_device_t d) { + if (!known_device_type_p (d)) +unknown_device_type_error (d); + int n = 0; struct gomp_device_descr *acc_dev; @@ -564,6 +586,9 @@ void acc_set_device_type (acc_device_t d) { + if (!known_device_type_p (d)) +unknown_device_ty
[PATCH 00/40] OpenACC "kernels" Improvements
Hi, this patch series implements the re-work of the OpenACC "kernels" implementation that has been announced at the GNU Tools Track of this year's Linux Plumbers Conference; see https://linuxplumbersconf.org/event/11/contributions/998/. Versions of the patches have also been committed to the devel/omp/gcc-11 branch recently. The patch series contains middle-end changes that modify the "kernels" loop handling to use Graphite for dependence analysis of loops in "kernels" regions, as well as new optimizations and adjustments to existing optimizations to support this analysis. A central step is contained in the commit titled "openacc: Use Graphite for dependence analysis in \"kernels\" regions" whose commit message also contains further explanations. There are also front end changes (cf. the patches by Sandra Loosemore) that prepare the loops in "kernels" regions for the middle-end processing and which lift various restrictions on "kernels" regions. I have included some dependences (the patches by Julian Brown) from the devel/omp/gcc-11 branch which will be re-submitted independently for review. I have bootstrapped the compiler on x86_64-linux-gnu and performed comprehensive testing on a powerpc64le-linux-gnu target. The patches should apply cleanly on commit r12-4865 of the master branch. I am aware that we cannot incorporate those patches into GCC at the current development stage. I hope that we can discuss some of the changes before they can be considered for inclusion in GCC during the next stage 1. Best regards, Frederik Andrew Stubbs (2): openacc: Add data optimization pass openacc: Add runtime a lias checking for OpenACC kernels Frederik Harwath (20): Fortran: Delinearize array accesses openacc: Move pass_oacc_device_lower after pass_graphite graphite: Extend SCoP detection dump output graphite: Rename isl_id_for_ssa_name graphite: Fix minor mistakes in comments Move compute_alias_check_pairs to tree-data-ref.c graphite: Add runtime alias checking openacc: Use Graphite for dependence analysis in "kernels" regions openacc: Add "can_be_parallel" flag info to "graph" dumps openacc: Remove unused partitioning in "kernels" regions Add function for printing a single OMP_CLAUSE openacc: Warn about "independent" "kernels" loops with data-dependences openacc: Handle internal function calls in pass_lim openacc: Disable pass_pre on outlined functions analyzed by Graphite graphite: Tune parameters for OpenACC use graphite: Adjust scop loop-nest choice graphite: Accept loops without data references openacc: Enable reduction variable localization for "kernels" openacc: Check type for references in reduction lowering openacc: Adjust testsuite to new "kernels" handling Julian Brown (4): Reference reduction localization Fix tree check failure with reduction localization Use more appropriate var in localize_reductions call Handle references in OpenACC "private" clauses Sandra Loosemore (12): Kernels loops annotation: C and C++. Add -fno-openacc-kernels-annotate-loops option to more testcases. Kernels loops annotation: Fortran. Additional Fortran testsuite fixes for kernels loops annotation pass. Fix bug in processing of array dimensions in data clauses. Add a "combined" flag for "acc kernels loop" etc directives. Annotate inner loops in "acc kernels loop" directives (C/C++). Annotate inner loops in "acc kernels loop" directives (Fortran). Permit calls to builtins and intrinsics in kernels loops. Fix patterns in Fortran tests for kernels loop annotation. Clean up loop variable extraction in OpenACC kernels loop annotation. Relax some restrictions on the loop bound in kernels loop annotation. Tobias Burnus (2): Fix for is_gimple_reg vars to 'data kernels' openacc: fix privatization of by-reference arrays gcc/Makefile.in | 2 + gcc/c-family/c-common.h | 1 + gcc/c-family/c-omp.c | 915 +++-- gcc/c-family/c.opt| 8 + gcc/c/c-decl.c| 28 + gcc/c/c-parser.c | 3 + gcc/cfgloop.c | 1 + gcc/cfgloop.h | 6 + gcc/cfgloopmanip.c| 1 + gcc/common.opt| 9 + gcc/config/nvptx/nvptx.c | 7 + gcc/cp/decl.c | 44 + gcc/cp/parser.c | 3 + gcc/cp/semantics.c| 9 + gcc/doc/gimple.texi | 2 + gcc/doc/invoke.texi | 52 +- gcc/doc/passes.texi
[PATCH 02/40] Add -fno-openacc-kernels-annotate-loops option to more testcases.
From: Sandra Loosemore 2020-03-27 Sandra Loosemore gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Add -fno-openacc-kernels-annotate-loops. --- gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c index cdf85d4bafae..0f2d2f0a757b 100644 --- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c +++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c @@ -1,5 +1,6 @@ /* Test OpenACC 'kernels' construct decomposition. */ +/* { dg-additional-options "-fno-openacc-kernels-annotate-loops" } */ /* { dg-additional-options "-fopt-info-omp-all" } */ /* { dg-additional-options "--param=openacc-kernels=decompose" } /* { dg-additional-options "-O2" } for 'parloops'. */ -- 2.33.0 - Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[PATCH 01/40] Kernels loops annotation: C and C++.
From: Sandra Loosemore This patch detects loops in kernels regions that are candidates for parallelization, and adds "#pragma acc loop auto" annotations to them. This annotation is controlled by the -fopenacc-kernels-annotate-loops option, which is enabled by default. -Wopenacc-kernels-annotate-loops can be used to produce diagnostics about loops that cannot be annotated. gcc/c-family/ * c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare. * c-omp.c: Include tree-iterator.h (enum annotation_state): New. (struct annotation_info): New. (do_not_annotate_loop): New. (do_not_annotate_loop_nest): New. (annotation_error): New. (c_finish_omp_for_internal): Split from c_finish_omp_for. Use annotation_error function. Code refactoring to avoid destructive changes that cannot be undone in case of error. (is_local_var): New. (lang_specific_unwrap_initializer): New. (annotate_for_loop): New. (check_and_annotate_for_loop): New. (annotate_loops_in_kernels_regions): New. (c_oacc_annotate_loops_in_kernels_regions): New. * c.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. gcc/c/ * c-decl.c (c_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/cp/ * decl.c (cp_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/ * doc/invoke.texi (Option Summary): Add entries for -Wopenacc-kernels-annotate-loops and -fno-openacc-kernels-annotate-loops. (Warning Options): Document -Wopenacc-kernels-annotate-loops. (Optimization Options): Document -fno-openacc-kernels-annotate-loops. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add -fno-openacc-kernels-annotate-loops option. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise. * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise. * c-c++-common/goacc/kernels-double-reduction.c: Likewise. * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-data-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-loop-data-update.c: Likewise. * c-c++-common/goacc/kernels-loop-data.c: Likewise. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-nest.c: Likewise. * c-c++-common/goacc/kernels-loop.c: Likewise. * c-c++-common/goacc/kernels-one-counter-var.c: Likewise. * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-reduction.c: Likewise. * c-c++-common/goacc/kernels-loop-annotation-1.c: New. * c-c++-common/goacc/kernels-loop-annotation-2.c: New. * c-c++-common/goacc/kernels-loop-annotation-3.c: New. * c-c++-common/goacc/kernels-loop-annotation-4.c: New. * c-c++-common/goacc/kernels-loop-annotation-5.c: New. * c-c++-common/goacc/kernels-loop-annotation-6.c: New. * c-c++-common/goacc/kernels-loop-annotation-7.c: New. * c-c++-common/goacc/kernels-loop-annotation-8.c: New. * c-c++-common/goacc/kernels-loop-annotation-9.c: New. * c-c++-common/goacc/kernels-loop-annotation-10.c: New. * c-c++-common/goacc/kernels-loop-annotation-11.c: New. * c-c++-common/goacc/kernels-loop-annotation-12.c: New. * c-c++-common/goacc/kernels-loop-annotation-13.c: New. * c-c++-common/goacc/kernels-loop-annotation-14.c: New. * c-c++-common/goacc/kernels-loop-annotation-15.c: New. * c-c++-common/goacc/kernels-loop-annotation-16.c: New. * c-c++-common/goacc/kernels-loop-annotation-17.c: New. --- gcc/c-family/c-common.h | 1 + gcc/c-family/c-omp.c | 799 -- gcc/c-family/c.opt| 8 + gcc/c/c-decl.c| 28 + gcc/cp/decl.c | 44 + gcc/doc/invoke.texi | 32 +- .../goacc/classify-kernels-unparallelized.c | 1 + .../c-c++-common/goacc/classify-kernels.c | 3 +- .../kernels-counter-var-redundant-load.c | 1 + .../kernels-counter-vars-function-scope.c | 1 + .../goacc/kernels-double-reduction-n.c| 1 + .../goacc/kernels-doub
[PATCH 03/40] Kernels loops annotation: Fortran.
From: Sandra Loosemore This patch implements the Fortran support for adding "#pragma acc loop auto" annotations to loops in OpenACC kernels regions. It implements the same -fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options that were previously added (and documented) for the C/C++ front ends. Co-Authored-By: Gergö Barany gcc/fortran/ * gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare. * lang.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. * openmp.c: Include options.h. (enum annotation_state, enum annotation_result): New. (check_code_for_invalid_calls): New. (check_expr_for_invalid_calls): New. (check_for_invalid_calls): New. (annotate_do_loop): New. (annotate_do_loops_in_kernels): New. (compute_goto_targets): New. (gfc_oacc_annotate_loops_in_kernels_regions): New. * parse.c (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops. gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add -fno-openacc-kernels-annotate-loops option. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data.f95: Likewise. * gfortran.dg/goacc/kernels-loop-n.f95: Likewise. * gfortran.dg/goacc/kernels-loop.f95: Likewise. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-1.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-9.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-10.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: New. --- gcc/fortran/gfortran.h| 1 + gcc/fortran/lang.opt | 8 + gcc/fortran/openmp.c | 364 ++ gcc/fortran/parse.c | 9 + .../goacc/classify-kernels-unparallelized.f95 | 1 + .../gfortran.dg/goacc/classify-kernels.f95| 1 + .../gfortran.dg/goacc/common-block-3.f90 | 1 + .../gfortran.dg/goacc/kernels-loop-2.f95 | 1 + .../goacc/kernels-loop-annotation-1.f95 | 33 ++ .../goacc/kernels-loop-annotation-10.f95 | 32 ++ .../goacc/kernels-loop-annotation-11.f95 | 34 ++ .../goacc/kernels-loop-annotation-12.f95 | 39 ++ .../goacc/kernels-loop-annotation-13.f95 | 38 ++ .../goacc/kernels-loop-annotation-14.f95 | 35 ++ .../goacc/kernels-loop-annotation-15.f95 | 35 ++ .../goacc/kernels-loop-annotation-16.f95 | 34 ++ .../goacc/kernels-loop-annotation-2.f95 | 32 ++ .../goacc/kernels-loop-annotation-3.f95 | 33 ++ .../goacc/kernels-loop-annotation-4.f95 | 34 ++ .../goacc/kernels-loop-annotation-5.f95 | 35 ++ .../goacc/kernels-loop-annotation-6.f95 | 34 ++ .../goacc/kernels-loop-annotation-7.f95 | 48 +++ .../goacc/kernels-loop-annotation-8.f95 | 50 +++ .../goacc/kernels-loop-annotation-9.f95 | 34 ++ .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit-2.f95 | 1 + .../goacc/kernels-loop-data-enter-exit.f95| 1 + .../goacc/kernels-loop-data-update.f95| 1 + .../gfortran.dg/goacc/kernels-loop-data.f95 | 1 + .../gfortran.dg/goacc/kernels-loop-n.f95 | 1 + .../gfortran.dg/goacc/kernels-loop.f95| 1 + .../kernels-parallel-loop-data-enter-exit.f95 | 1 + 32 files changed, 974 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-10.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-annotation-1