https://gcc.gnu.org/g:027cbe929314ee40bcca49fce3265f3ecbf72ed5
commit 027cbe929314ee40bcca49fce3265f3ecbf72ed5 Author: Chung-Lin Tang <clt...@codesourcery.com> Date: Fri May 19 12:14:04 2023 -0700 Use OpenACC code to process OpenMP target regions (forward ported from devel/omp/gcc-12) This is a backport of: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html This patch implements '-fopenmp-target=acc', which enables internally handling a subset of OpenMP target regions as OpenACC parallel regions. This basically includes target, teams, parallel, distribute, for/do constructs, and atomics. Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code paths handle them, with various needed adjustments throughout middle-end and nvptx backend. When using this "OMPACC" mode, if there are cases the patch doesn't handle, it issues a warning, and reverts to normal processing for that target region. gcc/ChangeLog: * builtins.cc (expand_builtin_omp_builtins): New function. (expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER, BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS, BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using expand_builtin_omp_builtins, enabled under -fopenmp-target=acc. * cgraphunit.cc (analyze_functions): Add call to omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc. * common.opt (fopenmp-target=): Add new option and enums. * config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=. * config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New prototype. (nvptx_mem_shared_p): Likewise. * config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX symbol for number of threads in team. (omp_num_threads_align): New var for alignment of omp_num_threads_sym. (need_omp_num_threads): New bool for if any function references omp_num_threads_sym. (nvptx_option_override): Initialize omp_num_threads_sym/align. (write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode. (nvptx_declare_function_name): Disable shim function under OMPACC mode. Disable soft-stack under OMPACC mode. Add generation of neutering init code under OMPACC mode. (nvptx_output_set_softstack): Return "" under OMPACC mode. (nvptx_expand_call): Set parallelism to vector for function calls with "ompacc for" attached. (nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode. (nvptx_expand_oacc_join): Likewise. (nvptx_expand_omp_get_num_threads): New function. (nvptx_mem_shared_p): New function. (nvptx_mach_max_workers): Return 1 under OMPACC mode. (nvptx_mach_vector_length): Return 32 under OMPACC mode. (nvptx_single): Add adjustments for OMPACC mode, which have parallel-construct fork/joins, and regions of code where neutering is dynamically determined. (nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for" attribute is attached to function. Disable uniform-simt when under OMPACC mode. (nvptx_file_end): Write __nvptx_omp_num_threads out when needed. (nvptx_goacc_fork_join): Return true under OMPACC mode. * config/nvptx/nvptx.h (struct GTY(()) machine_function): Add omp_parallel_predicate and omp_fn_entry_num_threads_reg fields. * config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID, UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID, UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries. (nvptx_shared_mem_operand): New predicate. (gomp_barrier): New expand pattern. (omp_get_num_threads): New expand pattern. (omp_get_num_teams): New insn pattern. (omp_get_thread_num): Likewise. (omp_get_team_num): Likewise. (get_ntid): Likewise. (nvptx_omp_parallel_fork): Likewise. (nvptx_omp_parallel_join): Likewise. * flag-types.h (omp_target_mode_kind): New flag value enum. * gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field. (gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_. (gimplify_adjust_omp_clauses): Likewise. (gimplify_omp_ctx_ompacc_p): New function. (gimplify_omp_for): Handle combined loops under OMPACC. * lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_. * omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST. (BUILT_IN_OMP_GET_NUM_THREADS): Likewise. * omp-expand.cc (remove_exit_barrier): Disable addressable-var processing for parallel construct child functions under OMPACC mode. (expand_oacc_for): Add OMPACC mode handling. (get_target_arguments): Force thread_limit clause value to 1 under OMPACC mode. (expand_omp): Under OMPACC mode, avoid child function expanding of GIMPLE_OMP_PARALLEL. * omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode. * omp-low.cc (struct omp_context): Add 'bool ompacc_p' field. (scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_. (ompacc_ctx_p): New function. (scan_omp_parallel): Handle OMPACC mode, avoid creating child function. (scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target construct child function, remove OMP_CLAUSE__OMPACC_ clauses. (lower_oacc_head_mark): Handle OMPACC mode cases. (lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add vector/gang clauses as needed. Add other OMPACC handling. (lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case. (lower_omp_target): Do OpenACC gang privatization under OMPACC case. (lower_omp_teams): Forward OpenACC privatization variables to outer target region under OMPACC mode. (lower_omp_1): Do OpenACC gang privatization under OMPACC case for GIMPLE_BIND. * omp-offload.cc (ompacc_supported_clauses_p): New function. (struct target_region_data): New struct type for tree walk. (scan_fndecl_for_ompacc): New function. (scan_omp_target_region_r): New function. (scan_omp_target_construct_r): New function. (omp_ompacc_attribute_tagging): New function. (oacc_dim_call): Add OMPACC case handling. (execute_oacc_device_lower): Make parts explicitly only OpenACC enabled. (pass_oacc_device_lower::gate): Enable pass under OMPACC mode. * omp-offload.h (omp_ompacc_attribute_tagging): New prototype. * opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp and no -fopenacc. * target-insns.def (gomp_barrier): New defined insn pattern. (omp_get_thread_num): Likewise. (omp_get_num_threads): Likewise. (omp_get_team_num): Likewise. (omp_get_num_teams): Likewise. * tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry for internal clause. * tree-nested.cc (convert_nonlocal_omp_clauses): Handle OMP_CLAUSE__OMPACC_. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_. * tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC mode cases. libgomp/ChangeLog: * config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in shared memory. (cherry picked from commit 5f881613fa9128edae5bbfa4e19f9752809e4bd7) Diff: --- gcc/builtins.cc | 71 ++++++ gcc/cgraphunit.cc | 7 +- gcc/common.opt | 13 + gcc/config/nvptx/mkoffload.cc | 13 + gcc/config/nvptx/nvptx-protos.h | 2 + gcc/config/nvptx/nvptx.cc | 269 +++++++++++++++++++-- gcc/config/nvptx/nvptx.h | 3 + gcc/config/nvptx/nvptx.md | 68 ++++++ gcc/expr.cc | 3 +- gcc/flag-types.h | 6 + gcc/gimplify.cc | 33 +++ gcc/lto-wrapper.cc | 1 + gcc/omp-builtins.def | 4 +- gcc/omp-expand.cc | 67 +++++- gcc/omp-general.cc | 11 +- gcc/omp-low.cc | 145 +++++++++++- gcc/omp-offload.cc | 302 +++++++++++++++++++++++- gcc/omp-offload.h | 1 + gcc/opts.cc | 8 + gcc/target-insns.def | 5 + gcc/tree-core.h | 4 + gcc/tree-nested.cc | 2 + gcc/tree-pretty-print.cc | 6 + gcc/tree.cc | 2 + gcc/tree.h | 3 + libgomp/config/nvptx/team.c | 3 + libgomp/testsuite/libgomp.c-c++-common/for-17.c | 69 ++++++ libgomp/testsuite/libgomp.c-c++-common/for-18.c | 5 + 28 files changed, 1071 insertions(+), 55 deletions(-) diff --git a/gcc/builtins.cc b/gcc/builtins.cc index f8d94c4b435..6b74cc7be60 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -7520,6 +7520,62 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore) return target; } +static rtx +expand_builtin_omp_builtins (tree exp, rtx target, int ignore) +{ + rtx ret = NULL; + rtx_insn *(*gen_fn) (rtx) = NULL; + + switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp))) + { + case BUILT_IN_GOMP_BARRIER: + if (targetm.have_gomp_barrier ()) + { + emit_insn (targetm.gen_gomp_barrier ()); + return target; + } + break; + + case BUILT_IN_OMP_GET_THREAD_NUM: + if (targetm.have_omp_get_thread_num ()) + gen_fn = targetm.gen_omp_get_thread_num; + break; + + case BUILT_IN_OMP_GET_NUM_THREADS: + if (targetm.have_omp_get_num_threads ()) + gen_fn = targetm.gen_omp_get_num_threads; + break; + + case BUILT_IN_OMP_GET_TEAM_NUM: + if (targetm.have_omp_get_team_num ()) + gen_fn = targetm.gen_omp_get_team_num; + break; + + case BUILT_IN_OMP_GET_NUM_TEAMS: + if (targetm.have_omp_get_num_teams ()) + gen_fn = targetm.gen_omp_get_num_teams; + break; + + default: + gcc_unreachable (); + } + + if (ignore) + return const0_rtx; + + if (gen_fn) + { + rtx reg = (MEM_P (target) + ? gen_reg_rtx (GET_MODE (target)) + : target); + emit_insn (gen_fn (reg)); + if (reg != target) + emit_move_insn (target, reg); + ret = target; + } + return ret; +} + /* Expand a string compare operation using a sequence of char comparison to get rid of the calling overhead, with result going to TARGET if that's convenient. @@ -8917,6 +8973,21 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, case BUILT_IN_GOACC_PARLEVEL_SIZE: return expand_builtin_goacc_parlevel_id_size (exp, target, ignore); + case BUILT_IN_GOMP_BARRIER: + case BUILT_IN_OMP_GET_THREAD_NUM: + case BUILT_IN_OMP_GET_NUM_THREADS: + case BUILT_IN_OMP_GET_TEAM_NUM: + case BUILT_IN_OMP_GET_NUM_TEAMS: + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + { + target = expand_builtin_omp_builtins (exp, target, ignore); + if (target) + return target; + } + break; + case BUILT_IN_SPECULATION_SAFE_VALUE_PTR: return expand_speculation_safe_value (VOIDmode, exp, target, ignore); diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc index 2bd0289ffba..c4d222a4111 100644 --- a/gcc/cgraphunit.cc +++ b/gcc/cgraphunit.cc @@ -1184,7 +1184,12 @@ analyze_functions (bool first_time) build_type_inheritance_graph (); if (flag_openmp && first_time) - omp_discover_implicit_declare_target (); + { + omp_discover_implicit_declare_target (); + + if(flag_openmp_target == OMP_TARGET_MODE_OMPACC) + omp_ompacc_attribute_tagging (); + } /* Analysis adds static variables that in turn adds references to new functions. So we need to iterate the process until it stabilize. */ diff --git a/gcc/common.opt b/gcc/common.opt index e19c4ef1166..98ba02b2f17 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2381,6 +2381,19 @@ Enum(target_simd_clone_device) String(nohost) Value(OMP_TARGET_SIMD_CLONE_NOHOST EnumValue Enum(target_simd_clone_device) String(any) Value(OMP_TARGET_SIMD_CLONE_ANY) +fopenmp-target= +Common Joined RejectNegative Enum(openmp_target) Var(flag_openmp_target) Init(OMP_TARGET_MODE_DEFAULT) +Execution model used for OpenMP target regions. + +Enum +Name(openmp_target) Type(int) + +EnumValue +Enum(openmp_target) String(default) Value(OMP_TARGET_MODE_DEFAULT) + +EnumValue +Enum(openmp_target) String(acc) Value(OMP_TARGET_MODE_OMPACC) + fopt-info Common Var(flag_opt_info) Optimization Enable all optimization info dumps on stderr. diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 503b1abcefd..19064deb622 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -704,6 +704,7 @@ main (int argc, char **argv) /* Scan the argument vector. */ bool fopenmp = false; + bool fopenmp_target = false; bool fopenacc = false; bool fPIC = false; bool fpic = false; @@ -723,6 +724,9 @@ main (int argc, char **argv) #undef STR else if (strcmp (argv[i], "-fopenmp") == 0) fopenmp = true; + else if (strncmp (argv[i], "-fopenmp-target=", + strlen ("-fopenmp-target=")) == 0) + fopenmp_target = true; else if (strcmp (argv[i], "-fopenacc") == 0) fopenacc = true; else if (strcmp (argv[i], "-fPIC") == 0) @@ -752,6 +756,15 @@ main (int argc, char **argv) if (!(fopenacc ^ fopenmp)) fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> " "must be set"); + if (fopenmp_target) + { + if (fopenacc) + fatal_error (input_location, "%<-fopenacc%> not compatible with " + "%<-fopenmp-target=%>"); + if (!fopenmp) + fatal_error (input_location, "%<-fopenmp-target=%> requires " + "%<-fopenmp%>"); + } struct obstack argv_obstack; obstack_init (&argv_obstack); diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index 3fc86c17bad..ed2ec0e3282 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -50,6 +50,7 @@ extern unsigned int ptx_version_to_number (enum ptx_version, bool); extern void nvptx_expand_oacc_fork (unsigned); extern void nvptx_expand_oacc_join (unsigned); extern void nvptx_expand_call (rtx, rtx); +extern void nvptx_expand_omp_get_num_threads (rtx); extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind); extern rtx nvptx_expand_compare (rtx); extern const char *nvptx_ptx_type_from_mode (machine_mode, bool); @@ -63,5 +64,6 @@ extern const char *nvptx_output_red_partition (rtx, rtx); extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int); extern bool nvptx_mem_local_p (rtx); extern bool nvptx_mem_maybe_shared_p (const_rtx); +extern bool nvptx_mem_shared_p (const_rtx); #endif #endif diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 3e52354bd12..9f77071a384 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -175,6 +175,9 @@ static unsigned gang_private_shared_align; static GTY(()) rtx gang_private_shared_sym; static hash_map<tree_decl_hash, unsigned int> gang_private_shared_hmap; +static GTY(()) rtx omp_num_threads_sym; +static unsigned omp_num_threads_align; + /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; @@ -184,6 +187,9 @@ static bool need_softstack_decl; /* True if any function references __nvptx_uni. */ static bool need_unisimt_decl; +/* True if any function references __nvptx_omp_num_threads. */ +static bool need_omp_num_threads; + static int nvptx_mach_max_workers (); /* Allocate a new, cleared machine_function structure. */ @@ -391,6 +397,10 @@ nvptx_option_override (void) SET_SYMBOL_DATA_AREA (gang_private_shared_sym, DATA_AREA_SHARED); gang_private_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + omp_num_threads_sym = gen_rtx_SYMBOL_REF (Pmode, "__nvptx_omp_num_threads"); + SET_SYMBOL_DATA_AREA (omp_num_threads_sym, DATA_AREA_SHARED); + omp_num_threads_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + diagnose_openacc_conflict (TARGET_GOMP, "-mgomp"); diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack"); diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt"); @@ -959,7 +969,8 @@ write_as_kernel (tree attrs) { return (lookup_attribute ("kernel", attrs) != NULL_TREE || (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE - && lookup_attribute ("oacc function", attrs) != NULL_TREE)); + && (lookup_attribute ("oacc function", attrs) != NULL_TREE + || lookup_attribute ("ompacc", attrs) != NULL_TREE))); /* For OpenMP target regions, the corresponding kernel entry is emitted from write_omp_entry as a separate function. */ } @@ -1493,6 +1504,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) DECL_ATTRIBUTES (decl))) force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { char *buf = (char *) alloca (strlen (name) + sizeof ("$impl")); @@ -1546,7 +1558,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) HOST_WIDE_INT sz = get_frame_size (); bool need_frameptr = sz || cfun->machine->has_chain; int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT; - if (!TARGET_SOFT_STACK) + if (!TARGET_SOFT_STACK || lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))) { /* Declare a local var for outgoing varargs. */ if (cfun->machine->has_variadic) @@ -1617,6 +1629,45 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) nvptx_init_unisimt_predicate (file); if (cfun->machine->bcast_partition || cfun->machine->sync_bar) nvptx_init_oacc_workers (file); + + if (offloading_function_p ((tree) decl) + && lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl))) + { + int nthr_regno = REGNO (cfun->machine->omp_fn_entry_num_threads_reg); + if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))) + { + fprintf (file, "\t{\n"); + if (cfun->machine->omp_parallel_predicate) + { + /* Borrow num-threads regno as temp register. */ + fprintf (file, "\t\tmov.u32 %%r%d, %%tid.x;\n", nthr_regno); + fprintf (file, "\t\tsetp.ne.u32 %%r%d, %%r%d, 0;\n", + REGNO (cfun->machine->omp_parallel_predicate), nthr_regno); + } + fprintf (file, "\t\tmov.u32 %%r%d, 1;\n", nthr_regno); + fprintf (file, "\t\tst.shared.u32 [__nvptx_omp_num_threads], %%r%d;\n", nthr_regno); + fprintf (file, "\t}\n"); + need_omp_num_threads = true; + } + else + { + fprintf (file, "\t\tld.shared.u32 %%r%d, [__nvptx_omp_num_threads];\n", nthr_regno); + if (cfun->machine->omp_parallel_predicate) + { + fprintf (file, "\t{\n"); + fprintf (file, "\t\t.reg.u32 %%tmp1;\n"); + fprintf (file, "\t\t.reg.pred %%not_parallel_mode, %%v1_lane;\n"); + fprintf (file, "\t\tsetp.eq.u32 %%not_parallel_mode, %%r%d, 1;\n", nthr_regno); + fprintf (file, "\t\tmov.u32 %%tmp1, %%tid.x;\n"); + fprintf (file, "\t\tsetp.ne.u32 %%v1_lane, %%tmp1, 0;\n"); + fprintf (file, "\t\tand.pred %%r%d, %%not_parallel_mode, %%v1_lane;\n", + REGNO (cfun->machine->omp_parallel_predicate)); + fprintf (file, "\t}\n"); + need_omp_num_threads = true; + } + } + } } /* Output code for switching uniform-simt state. ENTERING indicates whether @@ -1734,6 +1785,10 @@ nvptx_output_simt_exit (rtx src) const char * nvptx_output_set_softstack (unsigned src_regno) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return ""; if (cfun->machine->has_softstack && !crtl->is_leaf) { fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ", @@ -1852,20 +1907,29 @@ nvptx_expand_call (rtx retval, rtx address) if (DECL_STATIC_CHAIN (decl)) cfun->machine->has_chain = true; - tree attr = oacc_get_fn_attrib (decl); - if (attr) + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) { - tree dims = TREE_VALUE (attr); - - parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1; - for (int ix = 0; ix != GOMP_DIM_MAX; ix++) + if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl))) + parallel = GOMP_DIM_MASK (GOMP_DIM_VECTOR); + } + else + { + tree attr = oacc_get_fn_attrib (decl); + if (attr) { - if (TREE_PURPOSE (dims) - && !integer_zerop (TREE_PURPOSE (dims))) - break; - /* Not on this axis. */ - parallel ^= GOMP_DIM_MASK (ix); - dims = TREE_CHAIN (dims); + tree dims = TREE_VALUE (attr); + + parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1; + for (int ix = 0; ix != GOMP_DIM_MAX; ix++) + { + if (TREE_PURPOSE (dims) + && !integer_zerop (TREE_PURPOSE (dims))) + break; + /* Not on this axis. */ + parallel ^= GOMP_DIM_MASK (ix); + dims = TREE_CHAIN (dims); + } } } } @@ -1928,15 +1992,27 @@ nvptx_expand_compare (rtx compare) void nvptx_expand_oacc_fork (unsigned mode) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + mode = GOMP_DIM_VECTOR; nvptx_emit_forking (GOMP_DIM_MASK (mode), false); } void nvptx_expand_oacc_join (unsigned mode) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + mode = GOMP_DIM_VECTOR; nvptx_emit_joining (GOMP_DIM_MASK (mode), false); } +void +nvptx_expand_omp_get_num_threads (rtx target) +{ + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn (gen_rtx_SET (target, mem)); + need_omp_num_threads = true; +} + /* Generate instruction(s) to unpack a 64 bit object into 2 32 bit objects. */ @@ -2870,6 +2946,13 @@ nvptx_mem_maybe_shared_p (const_rtx x) return area == DATA_AREA_SHARED || area == DATA_AREA_GENERIC; } +bool +nvptx_mem_shared_p (const_rtx x) +{ + nvptx_data_area area = nvptx_mem_data_area (x); + return area == DATA_AREA_SHARED; +} + /* Print an operand, X, to FILE, with an optional modifier in CODE. Meaning of CODE: @@ -3474,6 +3557,11 @@ init_axis_dim (void) static int ATTRIBUTE_UNUSED nvptx_mach_max_workers () { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return 1; + if (!cfun->machine->axis_dim_init_p) init_axis_dim (); return cfun->machine->axis_dim[MACH_MAX_WORKERS]; @@ -3482,6 +3570,11 @@ nvptx_mach_max_workers () static int ATTRIBUTE_UNUSED nvptx_mach_vector_length () { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return 32; + if (!cfun->machine->axis_dim_init_p) init_axis_dim (); return cfun->machine->axis_dim[MACH_VECTOR_LENGTH]; @@ -4864,11 +4957,27 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) rtx_insn *tail = BB_END (to); unsigned skip_mask = mask; + rtx_insn *join = NULL; + rtx_insn *fork = NULL; + while (true) { /* Find first insn of from block. */ - while (head != BB_END (from) && !needs_neutering_p (head)) - head = NEXT_INSN (head); + while (true) + { + if (INSN_P (head) + && recog_memoized (head) == CODE_FOR_nvptx_join) + { + /* Record join if we see it. */ + gcc_assert (!join); + join = head; + } + + if (head != BB_END (from) && !needs_neutering_p (head)) + head = NEXT_INSN (head); + else + break; + } if (from == to) break; @@ -4886,8 +4995,46 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) /* Find last insn of to block */ rtx_insn *limit = from == to ? head : BB_HEAD (to); - while (tail != limit && !INSN_P (tail) && !LABEL_P (tail)) - tail = PREV_INSN (tail); + while (true) + { + if (INSN_P (tail) + && recog_memoized (tail) == CODE_FOR_nvptx_fork) + { + /* Record join if we see it. */ + gcc_assert (!fork); + fork = tail; + } + + if (tail != limit && !INSN_P (tail) && !LABEL_P (tail)) + tail = PREV_INSN (tail); + else + break; + } + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + if (join + /* We do not set/restore parallel state across function calls. */ + && !(INTVAL (XVECEXP (PATTERN (join), 0, 0)) & (1 << GOMP_DIM_MAX))) + { + rtx reg = cfun->machine->omp_fn_entry_num_threads_reg; + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn_before (gen_nvptx_omp_parallel_join (mem, reg), head); + need_omp_num_threads = true; + head = PREV_INSN (head); + } + + if (fork + /* We do not set/restore parallel state across function calls. */ + && !(INTVAL (XVECEXP (PATTERN (fork), 0, 0)) & (1 << GOMP_DIM_MAX))) + { + rtx reg = gen_reg_rtx (SImode); + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn_before (gen_get_ntid (reg), tail); + emit_insn_before (gen_nvptx_omp_parallel_fork (mem, reg), tail); + need_omp_num_threads = true; + } + } /* Detect if tail is a branch. */ rtx tail_branch = NULL_RTX; @@ -4934,16 +5081,31 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) if (GOMP_DIM_MASK (mode) & skip_mask) { rtx_code_label *label = gen_label_rtx (); - rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER]; rtx_insn **mode_jump = mode == GOMP_DIM_VECTOR ? &vector_jump : &worker_jump; rtx_insn **mode_label = mode == GOMP_DIM_VECTOR ? &vector_label : &worker_label; - if (!pred) + rtx pred; + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && mode == GOMP_DIM_VECTOR) + { + pred = cfun->machine->omp_parallel_predicate; + if (!pred) + { + pred = gen_reg_rtx (BImode); + cfun->machine->omp_parallel_predicate = pred; + } + } + else { - pred = gen_reg_rtx (BImode); - cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred; + pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER]; + if (!pred) + { + pred = gen_reg_rtx (BImode); + cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred; + } } rtx br; @@ -5058,7 +5220,38 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) rtx tmp = gen_reg_rtx (BImode); emit_insn_before (gen_movbi (tmp, const0_rtx), bb_first_real_insn (from)); - emit_insn_before (gen_rtx_SET (tmp, pvar), label); + + if(flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + rtx nthr = cfun->machine->omp_fn_entry_num_threads_reg; + rtx single_p = gen_reg_rtx (BImode); + + rtx_code_label *lbl_copy_tmp_pvar = gen_label_rtx (); + LABEL_NUSES (lbl_copy_tmp_pvar) = 1; + + rtx_insn *lbl_fallthru = NEXT_INSN (tail); + gcc_assert (lbl_fallthru); + if (!LABEL_P (lbl_fallthru)) + { + rtx_code_label *nlbl = gen_label_rtx (); + LABEL_NUSES (nlbl) = 1; + emit_label_before (nlbl, lbl_fallthru); + lbl_fallthru = nlbl; + } + emit_insn_before + (gen_rtx_SET (single_p, + gen_rtx_EQ (BImode, nthr, GEN_INT (1))), + label); + emit_insn_before + (gen_br_true (single_p, lbl_copy_tmp_pvar), label); + emit_jump_insn_before (copy_rtx (tail_branch), label); + emit_insn_before (gen_jump (lbl_fallthru), label); + emit_label_before (lbl_copy_tmp_pvar, label); + emit_insn_before (gen_rtx_SET (tmp, pvar), label); + } + else + emit_insn_before (gen_rtx_SET (tmp, pvar), label); + emit_insn_before (gen_rtx_SET (pvar, tmp), tail); #endif emit_insn_before (nvptx_gen_warp_bcast (pvar), tail); @@ -5817,10 +6010,29 @@ nvptx_reorg (void) delete pars; } + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && offloading_function_p (current_function_decl) + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl)) + && !lookup_attribute ("ompacc seq", + DECL_ATTRIBUTES (current_function_decl))) + { + cfun->machine->omp_fn_entry_num_threads_reg = gen_reg_rtx (SImode); + + /* Discover & process partitioned regions. */ + parallel *pars = nvptx_discover_pars (&bb_insn_map); + nvptx_process_pars (pars); + nvptx_neuter_pars (pars, GOMP_DIM_MASK (GOMP_DIM_VECTOR), 0); + delete pars; + } + /* Replace subregs. */ nvptx_reorg_subreg (); - if (TARGET_UNIFORM_SIMT) + if (TARGET_UNIFORM_SIMT + && (flag_openmp_target != OMP_TARGET_MODE_OMPACC + || !lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl)))) nvptx_reorg_uniform_simt (); #if WORKAROUND_PTXJIT_BUG_2 @@ -6071,6 +6283,12 @@ nvptx_file_end (void) write_var_marker (asm_out_file, false, true, "__nvptx_uni"); fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n"); } + if (need_omp_num_threads) + { + write_var_marker (asm_out_file, false, true, "__nvptx_omp_num_threads"); + fprintf (asm_out_file, + ".extern .shared .u32 __nvptx_omp_num_threads;\n"); + } } /* Expander for the shuffle builtins. */ @@ -6758,6 +6976,9 @@ nvptx_goacc_fork_join (gcall *call, const int dims[], tree arg = gimple_call_arg (call, 2); unsigned axis = TREE_INT_CST_LOW (arg); + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + return true; + /* We only care about worker and vector partitioning. */ if (axis < GOMP_DIM_WORKER) return false; diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index 74f4a68924c..cadc6fb4ab1 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -235,6 +235,9 @@ struct GTY(()) machine_function for per-lane storage in OpenMP SIMD regions. */ unsigned HOST_WIDE_INT simt_stack_size; unsigned HOST_WIDE_INT simt_stack_align; + + rtx omp_parallel_predicate; + rtx omp_fn_entry_num_threads_reg; }; #endif diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 4c32a20176a..872f4341899 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -78,6 +78,14 @@ UNSPECV_SIMT_EXIT UNSPECV_RED_PART + + UNSPECV_GET_TID + UNSPECV_GET_NTID + UNSPECV_GET_CTAID + UNSPECV_GET_NCTAID + + UNSPECV_OMP_PARALLEL_FORK + UNSPECV_OMP_PARALLEL_JOIN ]) (define_attr "subregs_ok" "false,true" @@ -121,6 +129,12 @@ : immediate_operand (op, mode)); }) +(define_predicate "nvptx_shared_mem_operand" + (match_code "mem") +{ + return nvptx_mem_shared_p (op); +}) + (define_predicate "const0_operand" (and (match_code "const_int") (match_test "op == const0_rtx"))) @@ -1771,6 +1785,60 @@ return asms[INTVAL (operands[1])]; }) +(define_expand "gomp_barrier" + [(const_int 1)] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" +{ + emit_insn (gen_nvptx_barsync (GEN_INT (0), GEN_INT (0))); + DONE; +}) + +(define_expand "omp_get_num_threads" + [(match_operand 0 "nvptx_register_operand" "=R")] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" +{ + nvptx_expand_omp_get_num_threads (operands[0]); + DONE; +}) + +(define_insn "omp_get_num_teams" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NCTAID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%nctaid.x;") + +(define_insn "omp_get_thread_num" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_TID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%tid.x;") + +(define_insn "omp_get_team_num" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_CTAID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%ctaid.x;") + +(define_insn "get_ntid" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NTID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%ntid.x;") + +(define_insn "nvptx_omp_parallel_fork" + [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m") + (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")] + UNSPECV_OMP_PARALLEL_FORK))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tst.shared.u32\\t%0, %1; //omp parallel fork") + +(define_insn "nvptx_omp_parallel_join" + [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m") + (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")] + UNSPECV_OMP_PARALLEL_JOIN))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tst.shared.u32\\t%0, %1; //omp parallel join") + (define_insn "nvptx_fork" [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")] UNSPECV_FORK)] diff --git a/gcc/expr.cc b/gcc/expr.cc index d4414e242cb..ec7a4f82137 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -11296,7 +11296,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, /* Allow accel compiler to handle variables that require special treatment, e.g. if they have been modified in some way earlier in compilation by the adjust_private_decl OpenACC hook. */ - if (flag_openacc && targetm.goacc.expand_var_decl) + if ((flag_openacc || flag_openmp_target == OMP_TARGET_MODE_OMPACC) + && targetm.goacc.expand_var_decl) { temp = targetm.goacc.expand_var_decl (exp); if (temp) diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 5062f59bc8f..283b1ddfcba 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -522,6 +522,12 @@ enum omp_target_simd_clone_device_kind OMP_TARGET_SIMD_CLONE_ANY = 3 }; +enum omp_target_mode_kind +{ + OMP_TARGET_MODE_DEFAULT = 0, + OMP_TARGET_MODE_OMPACC = 1 +}; + #endif #endif /* ! GCC_FLAG_TYPES_H */ diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 87d83ec39f1..b4b70b43db9 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -260,6 +260,7 @@ struct gimplify_omp_ctx bool order_concurrent; bool has_depend; bool in_for_exprs; + bool ompacc; int defaultmap[5]; }; @@ -13210,6 +13211,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_CLAUSE_FINALIZE: break; + case OMP_CLAUSE__OMPACC_: + ctx->ompacc = true; + break; + case OMP_CLAUSE_ORDER: ctx->order_concurrent = true; break; @@ -14711,6 +14716,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, case OMP_CLAUSE_INCLUSIVE: case OMP_CLAUSE_EXCLUSIVE: case OMP_CLAUSE_USES_ALLOCATORS: + case OMP_CLAUSE__OMPACC_: break; case OMP_CLAUSE_NOHOST: @@ -15434,6 +15440,21 @@ gimplify_omp_loop_xform (tree *expr_p, gimple_seq *pre_p) return GS_ALL_DONE; } +/* Return true if in an omp_context in OMPACC mode. */ +static bool +gimplify_omp_ctx_ompacc_p (void) +{ + if (cgraph_node::get (current_function_decl)->offloadable + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return true; + struct gimplify_omp_ctx *ctx; + for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context) + if (ctx->ompacc) + return true; + return false; +} + /* Gimplify the gross structure of an OMP_FOR statement. */ static enum gimplify_status @@ -15465,6 +15486,18 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) *expr_p = NULL_TREE; return GS_ERROR; } + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimplify_omp_ctx_ompacc_p ()) + { + gcc_assert (inner_for_stmt && TREE_CODE (for_stmt) == OMP_DISTRIBUTE); + *expr_p = OMP_FOR_BODY (for_stmt); + tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_GANG); + OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (inner_for_stmt); + OMP_FOR_CLAUSES (inner_for_stmt) = c; + return GS_OK; + } + gcc_assert (inner_for_stmt == *data[3]); omp_maybe_apply_loop_xforms (data[3], data[2] diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc index 02579951569..c356698a1f9 100644 --- a/gcc/lto-wrapper.cc +++ b/gcc/lto-wrapper.cc @@ -738,6 +738,7 @@ append_compiler_options (obstack *argv_obstack, vec<cl_decoded_option> opts) case OPT_fcommon: case OPT_fgnu_tm: case OPT_fopenmp: + case OPT_fopenmp_target_: case OPT_fopenacc: case OPT_fopenacc_dim_: case OPT_foffload_abi_: diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index f9ce137d0b4..65393ab3210 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -71,9 +71,9 @@ DEF_GOACC_BUILTIN_ONLY (BUILT_IN_GOACC_SINGLE_COPY_END, "GOACC_single_copy_end", DEF_GOMP_BUILTIN (BUILT_IN_OMP_IS_INITIAL_DEVICE, "omp_is_initial_device", BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_THREAD_NUM, "omp_get_thread_num", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) + BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_THREADS, "omp_get_num_threads", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) + BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_TEAM_NUM, "omp_get_team_num", BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_TEAMS, "omp_get_num_teams", diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index 3f5acca95ec..102f1e988d5 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -1047,11 +1047,16 @@ remove_exit_barrier (struct omp_region *region) from within current function (this would be easy to check) or from some function it calls and gets passed an address of such a variable. */ + gomp_parallel *parallel_stmt + = as_a <gomp_parallel *> (last_nondebug_stmt (region->entry)); + tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && child_fun == NULL_TREE) + any_addressable_vars = 0; + if (any_addressable_vars < 0) { - gomp_parallel *parallel_stmt - = as_a <gomp_parallel *> (last_nondebug_stmt (region->entry)); - tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt); tree local_decls, block, decl; unsigned ix; @@ -7732,6 +7737,17 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) /* The SSA parallelizer does gang parallelism. */ gwv = build_int_cst (integer_type_node, GOMP_DIM_MASK (GOMP_DIM_GANG)); } + else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + tree clauses = gimple_omp_for_clauses (for_stmt); + int omp_mask = 0; + if (omp_find_clause (clauses, OMP_CLAUSE_GANG)) + omp_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG); + if (omp_find_clause (clauses, OMP_CLAUSE_VECTOR)) + omp_mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR); + gcc_assert (omp_mask); + gwv = build_int_cst (integer_type_node, omp_mask); + } if (fd->collapse > 1 || fd->tiling) { @@ -9759,6 +9775,13 @@ get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt) t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c); else t = integer_minus_one_node; + + /* Currently, OMPACC mode has a limitation of only one warp thread. */ + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute + ("ompacc", DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt_stmt)))) + t = integer_one_node; + push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL, GOMP_TARGET_ARG_THREAD_LIMIT, t, &args); @@ -10656,6 +10679,44 @@ expand_omp (struct omp_region *region) switch (region->type) { case GIMPLE_OMP_PARALLEL: + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + struct omp_region *r; + for (r = region->outer; r; r = r->outer) + if (r->type == GIMPLE_OMP_TARGET) + { + gomp_target *tgt + = as_a <gomp_target *> (last_nondebug_stmt (r->entry)); + tree tgtfn_attrs + = DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt)); + if (!lookup_attribute ("ompacc", tgtfn_attrs)) + r = NULL; + break; + } + if (r != NULL + || (lookup_attribute + ("ompacc", DECL_ATTRIBUTES (current_function_decl)))) + { + gimple_stmt_iterator gsi; + gsi = gsi_last_nondebug_bb (region->entry); + gcc_assert (!gsi_end_p (gsi) + && gimple_code + (gsi_stmt (gsi)) == GIMPLE_OMP_PARALLEL); + gsi_remove (&gsi, true); + + if (region->exit) + { + gsi = gsi_last_nondebug_bb (region->exit); + gcc_assert (!gsi_end_p (gsi) + && gimple_code + (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN); + gsi_remove (&gsi, true); + } + break; + } + } + /* Fallthrough. */ + case GIMPLE_OMP_TASK: expand_omp_taskreg (region); break; diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index d7b09eae5ff..190130e16a3 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -213,8 +213,12 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd, struct omp_for_data_loop dummy_loop; location_t loc = gimple_location (for_stmt); bool simd = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_SIMD; - bool distribute = gimple_omp_for_kind (for_stmt) - == GF_OMP_FOR_KIND_DISTRIBUTE; + bool distribute = + (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_DISTRIBUTE + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP + && omp_find_clause (gimple_omp_for_clauses (for_stmt), + OMP_CLAUSE_GANG))); bool taskloop = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_TASKLOOP; bool order_reproducible = false; @@ -453,7 +457,8 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd, loop->n2 = gimple_omp_for_final (for_stmt, i); gcc_assert (loop->cond_code != NE_EXPR || (gimple_omp_for_kind (for_stmt) - != GF_OMP_FOR_KIND_OACC_LOOP)); + != GF_OMP_FOR_KIND_OACC_LOOP) + || flag_openmp_target == OMP_TARGET_MODE_OMPACC); if (TREE_CODE (loop->n2) == TREE_VEC) { if (loop->outer) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index 3f6d97f88f4..6152750b5b8 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -181,6 +181,10 @@ struct omp_context than teams is strictly nested in it. */ bool nonteams_nested_p; + /* Indicates that context is in OMPACC mode, set after _ompacc_ internal + clauses are removed. */ + bool ompacc_p; + /* Candidates for adjusting OpenACC privatization level. */ vec<tree> oacc_privatization_candidates; }; @@ -1957,6 +1961,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_TASK_REDUCTION: case OMP_CLAUSE_ALLOCATE: case OMP_CLAUSE_USES_ALLOCATORS: + case OMP_CLAUSE__OMPACC_: break; case OMP_CLAUSE_ALIGNED: @@ -2176,6 +2181,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_FILTER: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE_USES_ALLOCATORS: + case OMP_CLAUSE__OMPACC_: break; case OMP_CLAUSE__CACHE_: @@ -2245,6 +2251,21 @@ omp_maybe_offloaded_ctx (omp_context *ctx) return false; } +static bool +ompacc_ctx_p (omp_context *ctx) +{ + if (cgraph_node::get (current_function_decl)->offloadable + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return true; + for (; ctx; ctx = ctx->outer) + if (is_gimple_omp_offloaded (ctx->stmt)) + return (ctx->ompacc_p + || omp_find_clause (gimple_omp_target_clauses (ctx->stmt), + OMP_CLAUSE__OMPACC_)); + return false; +} + /* Build a decl for the omp child function. It'll not contain a body yet, just the bare decl. */ @@ -2550,8 +2571,28 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx) DECL_NAMELESS (name) = 1; TYPE_NAME (ctx->record_type) = name; TYPE_ARTIFICIAL (ctx->record_type) = 1; - create_omp_child_function (ctx, false); - gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)) + { + tree data_name = get_identifier (".omp_data_i_par"); + tree t = build_decl (gimple_location (stmt), VAR_DECL, data_name, + ptr_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_NAMELESS (t) = 1; + DECL_CONTEXT (t) = current_function_decl; + DECL_SEEN_IN_BIND_EXPR_P (t) = 1; + DECL_CHAIN (t) = ctx->block_vars; + ctx->block_vars = t; + TREE_USED (t) = 1; + TREE_READONLY (t) = 1; + ctx->receiver_decl = t; + } + else + { + create_omp_child_function (ctx, false); + gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn); + } scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx); scan_omp (gimple_omp_body_ptr (stmt), ctx); @@ -3382,6 +3423,24 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx) scan_sharing_clauses (clauses, ctx); scan_omp (gimple_omp_body_ptr (stmt), ctx); + if (offloaded && flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + for (tree *cp = gimple_omp_target_clauses_ptr (stmt); *cp; + cp = &OMP_CLAUSE_CHAIN (*cp)) + if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE__OMPACC_) + { + DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt)) + = tree_cons (get_identifier ("ompacc"), NULL_TREE, + DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt))); + /* Unlink and remove. */ + *cp = OMP_CLAUSE_CHAIN (*cp); + + /* Set to true. */ + ctx->ompacc_p = true; + break; + } + } + if (TYPE_FIELDS (ctx->record_type) == NULL) ctx->record_type = ctx->receiver_decl = NULL; else @@ -8612,6 +8671,9 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses, gcc_unreachable (); else if (is_oacc_kernels_decomposed_part (tgt)) ; + else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && is_omp_target (tgt->stmt)) + ; else gcc_unreachable (); @@ -8629,7 +8691,13 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses, gcc_assert (!(tag & OLF_AUTO)); } - if (tag & OLF_TILE) + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_code (ctx->stmt) == GIMPLE_OMP_PARALLEL + && tgt + && ompacc_ctx_p (tgt)) + levels = 1; + else + if (tag & OLF_TILE) /* Tiling could use all 3 levels. */ levels = 3; else @@ -11893,6 +11961,23 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) push_gimplify_context (); + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)) + { + enum omp_clause_code code = OMP_CLAUSE_ERROR; + if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR) + code = OMP_CLAUSE_VECTOR; + else if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_DISTRIBUTE) + code = OMP_CLAUSE_GANG; + if (code) + { + /* Adjust into OACC loop kind with vector/gang clause. */ + gimple_omp_for_set_kind (stmt, GF_OMP_FOR_KIND_OACC_LOOP); + tree c = build_omp_clause (UNKNOWN_LOCATION, code); + OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (stmt); + gimple_omp_for_set_clauses (stmt, c); + } + } + if (is_gimple_omp_oacc (ctx->stmt)) oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt)); @@ -11914,7 +11999,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *inner_bind = as_a <gbind *> (gimple_seq_first_stmt (omp_for_body)); tree vars = gimple_bind_vars (inner_bind); - if (is_gimple_omp_oacc (ctx->stmt)) + if (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx))) oacc_privatization_scan_decl_chain (ctx, vars); gimple_bind_append_vars (new_stmt, vars); /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't @@ -12030,7 +12117,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (stmt), ctx); gcall *private_marker = NULL; - if (is_gimple_omp_oacc (ctx->stmt) + if ((is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))) && !gimple_seq_empty_p (omp_for_body)) private_marker = lower_oacc_private_marker (ctx); @@ -12085,11 +12173,13 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) /* Once lowered, extract the bounds and clauses. */ omp_extract_for_data (stmt, &fd, NULL); - if (is_gimple_omp_oacc (ctx->stmt) - && !ctx_in_oacc_kernels_region (ctx)) - lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), private_marker, - &oacc_head, &oacc_tail, ctx); + if (flag_openacc) + { + if (is_gimple_omp_oacc (ctx->stmt) && !ctx_in_oacc_kernels_region (ctx)) + lower_oacc_head_tail (gimple_location (stmt), + gimple_omp_for_clauses (stmt), private_marker, + &oacc_head, &oacc_tail, ctx); + } /* Add OpenACC partitioning and reduction markers just before the loop. */ if (oacc_head) @@ -12873,9 +12963,20 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx) bind = gimple_build_bind (NULL, NULL, make_node (BLOCK)); else bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind)); + + gimple_seq oacc_head = NULL, oacc_tail = NULL; + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_code (stmt) == GIMPLE_OMP_PARALLEL + && ompacc_ctx_p (ctx)) + lower_oacc_head_tail (gimple_location (stmt), clauses, + NULL, &oacc_head, &oacc_tail, + ctx); + gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true); gimple_bind_add_seq (bind, ilist); + gimple_bind_add_seq (bind, oacc_head); gimple_bind_add_stmt (bind, stmt); + gimple_bind_add_seq (bind, oacc_tail); gimple_bind_add_seq (bind, olist); pop_gimplify_context (NULL); @@ -14731,7 +14832,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) gimple_seq fork_seq = NULL; gimple_seq join_seq = NULL; - if (offloaded && is_gimple_omp_oacc (ctx->stmt)) + if (offloaded && (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)))) { /* If there are reductions on the offloaded region itself, treat them as a dummy GANG loop. */ @@ -14854,6 +14957,22 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (teams_stmt), ctx); lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist, NULL, ctx); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)) + { + /* Forward the team/gang-wide variables to outer target region. */ + struct omp_context *tgt = ctx; + while (tgt && !is_gimple_omp_offloaded (tgt->stmt)) + tgt = tgt->outer; + if (tgt) + { + int i; + tree decl; + FOR_EACH_VEC_ELT (ctx->oacc_privatization_candidates, i, decl) + tgt->oacc_privatization_candidates.safe_push (decl); + } + } + gimple_seq_add_stmt (&bind_body, teams_stmt); gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt)); @@ -15021,7 +15140,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) ctx); break; case GIMPLE_BIND: - if (ctx && is_gimple_omp_oacc (ctx->stmt)) + if (ctx && (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)))) { tree vars = gimple_bind_vars (as_a <gbind *> (stmt)); oacc_privatization_scan_decl_chain (ctx, vars); diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index 6c652387a07..6371177d054 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -391,6 +391,268 @@ omp_discover_implicit_declare_target (void) lang_hooks.decls.omp_finish_decl_inits (); } +static bool ompacc_supported_clauses_p (tree clauses) +{ + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_NOWAIT: + continue; + default: + return false; + } + return true; +} + +struct target_region_data +{ + tree func_decl; + bool has_omp_for; + bool has_omp_parallel; + bool ompacc_invalid; + auto_vec<const char *> warning_msgs; + auto_vec<location_t> warning_locs; + target_region_data (void) + : func_decl (NULL_TREE), + has_omp_for (false), has_omp_parallel (false), ompacc_invalid (false), + warning_msgs (), warning_locs () {} +}; + +static tree scan_omp_target_region_r (tree *, int *, void *); + +static void +scan_fndecl_for_ompacc (tree decl, target_region_data *tgtdata) +{ + target_region_data td; + td.func_decl = decl; + walk_tree_without_duplicates (&DECL_SAVED_TREE (decl), + scan_omp_target_region_r, &td); + tree v; + if ((v = lookup_attribute ("omp declare variant base", + DECL_ATTRIBUTES (decl))) + || (v = lookup_attribute ("omp declare variant variant", + DECL_ATTRIBUTES (decl)))) + { + td.ompacc_invalid = true; + td.warning_msgs.safe_push ("declare variant not supported for OMPACC"); + td.warning_locs.safe_push (EXPR_LOCATION (v)); + } + if (tgtdata) + { + tgtdata->has_omp_for |= td.has_omp_for; + tgtdata->has_omp_parallel |= td.has_omp_parallel; + tgtdata->ompacc_invalid |= td.ompacc_invalid; + for (unsigned i = 0; i < td.warning_msgs.length (); i++) + tgtdata->warning_msgs.safe_push (td.warning_msgs[i]); + for (unsigned i = 0; i < td.warning_locs.length (); i++) + tgtdata->warning_locs.safe_push (td.warning_locs[i]); + } + + if (!td.ompacc_invalid + && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))) + { + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("ompacc"), NULL_TREE, + DECL_ATTRIBUTES (decl)); + if (!td.has_omp_parallel) + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("ompacc seq"), NULL_TREE, + DECL_ATTRIBUTES (decl)); + } +} + +static tree +scan_omp_target_region_r (tree *tp, int *walk_subtrees, void *data) +{ + target_region_data *tgtdata = (target_region_data *) data; + + if (TREE_CODE (*tp) == FUNCTION_DECL + && !(fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_THREAD_NUM) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_THREADS) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_TEAM_NUM) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_TEAMS) + || id_equal (DECL_NAME (*tp), "omp_get_thread_num") + || id_equal (DECL_NAME (*tp), "omp_get_num_threads") + || id_equal (DECL_NAME (*tp), "omp_get_team_num") + || id_equal (DECL_NAME (*tp), "omp_get_num_teams")) + && *tp != tgtdata->func_decl) + { + tree decl = *tp; + symtab_node *node = symtab_node::get (*tp); + if (node) + { + node = node->ultimate_alias_target (); + decl = node->decl; + } + + if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl)) + { + scan_fndecl_for_ompacc (decl, tgtdata); + } + else + { + tgtdata->warning_msgs.safe_push ("referencing external function"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + tgtdata->ompacc_invalid = true; + } + *walk_subtrees = 0; + return NULL_TREE; + } + + switch (TREE_CODE (*tp)) + { + case OMP_FOR: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else if (OMP_FOR_NON_RECTANGULAR (*tp)) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("non-rectangular loops not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else + tgtdata->has_omp_for = true; + break; + + case OMP_PARALLEL: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else + tgtdata->has_omp_parallel = true; + break; + + case OMP_DISTRIBUTE: + case OMP_TEAMS: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + /* Fallthru. */ + + case OMP_ATOMIC: + case OMP_ATOMIC_READ: + case OMP_ATOMIC_CAPTURE_OLD: + case OMP_ATOMIC_CAPTURE_NEW: + break; + + case OMP_SIMD: + case OMP_TASK: + case OMP_LOOP: + case OMP_TASKLOOP: + case OMP_TASKGROUP: + case OMP_SECTION: + case OMP_MASTER: + case OMP_MASKED: + case OMP_ORDERED: + case OMP_CRITICAL: + case OMP_SCAN: + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("construct not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + *walk_subtrees = 0; + break; + + case OMP_TARGET: + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("nested target/reverse offload " + "not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + *walk_subtrees = 0; + break; + + default: + break; + } + return NULL_TREE; +} + +static tree +scan_omp_target_construct_r (tree *tp, int *walk_subtrees, + void *data) +{ + if (TREE_CODE (*tp) == OMP_TARGET) + { + target_region_data td; + td.func_decl = (tree) data; + walk_tree_without_duplicates (&OMP_TARGET_BODY (*tp), + scan_omp_target_region_r, &td); + for (tree c = OMP_TARGET_CLAUSES (*tp); c; c = OMP_CLAUSE_CHAIN (c)) + { + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_MAP: + continue; + default: + td.ompacc_invalid = true; + td.warning_msgs.safe_push ("clause not supported"); + td.warning_locs.safe_push (EXPR_LOCATION (c)); + break; + } + break; + } + if (!td.ompacc_invalid) + { + tree c = build_omp_clause (EXPR_LOCATION (*tp), OMP_CLAUSE__OMPACC_); + if (!td.has_omp_parallel) + OMP_CLAUSE__OMPACC__SEQ (c) = 1; + OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (*tp); + OMP_TARGET_CLAUSES (*tp) = c; + } + else + { + warning_at (EXPR_LOCATION (*tp), 0, "Target region not suitable for " + "OMPACC mode"); + for (unsigned i = 0; i < td.warning_locs.length (); i++) + warning_at (td.warning_locs[i], 0, td.warning_msgs[i]); + } + *walk_subtrees = 0; + } + return NULL_TREE; +} + +void +omp_ompacc_attribute_tagging (void) +{ + cgraph_node *node; + FOR_EACH_DEFINED_FUNCTION (node) + if (DECL_SAVED_TREE (node->decl)) + { + if (DECL_STRUCT_FUNCTION (node->decl) + && DECL_STRUCT_FUNCTION (node->decl)->has_omp_target) + walk_tree_without_duplicates (&DECL_SAVED_TREE (node->decl), + scan_omp_target_construct_r, + node->decl); + + for (cgraph_node *cgn = first_nested_function (node); + cgn; cgn = next_nested_function (cgn)) + if (omp_declare_target_fn_p (cgn->decl)) + { + scan_fndecl_for_ompacc (cgn->decl, NULL); + + if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (cgn->decl)) + && !lookup_attribute ("noinline", DECL_ATTRIBUTES (cgn->decl))) + { + DECL_ATTRIBUTES (cgn->decl) + = tree_cons (get_identifier ("noinline"), + NULL, DECL_ATTRIBUTES (cgn->decl)); + DECL_ATTRIBUTES (cgn->decl) + = tree_cons (get_identifier ("noipa"), + NULL, DECL_ATTRIBUTES (cgn->decl)); + } + } + } +} /* Create new symbols containing (address, size) pairs for global variables, marked with "omp declare target" attribute, as well as addresses for the @@ -509,6 +771,22 @@ omp_finish_file (void) static tree oacc_dim_call (bool pos, int dim, gimple_seq *seq) { + if (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + enum built_in_function fn; + if (dim == GOMP_DIM_VECTOR) + fn = pos ? BUILT_IN_OMP_GET_THREAD_NUM : BUILT_IN_OMP_GET_NUM_THREADS; + else if (dim == GOMP_DIM_GANG) + fn = pos ? BUILT_IN_OMP_GET_TEAM_NUM : BUILT_IN_OMP_GET_NUM_TEAMS; + else + gcc_unreachable (); + tree size = create_tmp_var (integer_type_node); + gimple *call = gimple_build_call (builtin_decl_explicit (fn), 0); + gimple_call_set_lhs (call, size); + gimple_seq_add_stmt (seq, call); + return size; + } + tree arg = build_int_cst (unsigned_type_node, dim); tree size = create_tmp_var (integer_type_node); enum internal_fn fn = pos ? IFN_GOACC_DIM_POS : IFN_GOACC_DIM_SIZE; @@ -2252,15 +2530,19 @@ execute_oacc_loop_designation () static unsigned int execute_oacc_device_lower () { - tree attrs = oacc_get_fn_attrib (current_function_decl); + tree attrs; + int dims[GOMP_DIM_MAX]; - if (!attrs) - /* Not an offloaded function. */ - return 0; + if (flag_openacc) + { + attrs = oacc_get_fn_attrib (current_function_decl); + if (!attrs) + /* Not an offloaded function. */ + return 0; - int dims[GOMP_DIM_MAX]; - for (unsigned i = 0; i < GOMP_DIM_MAX; i++) - dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + for (unsigned i = 0; i < GOMP_DIM_MAX; i++) + dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + } hash_map<tree, tree> adjusted_vars; @@ -2329,7 +2611,8 @@ execute_oacc_device_lower () case IFN_UNIQUE_OACC_FORK: case IFN_UNIQUE_OACC_JOIN: - if (integer_minus_onep (gimple_call_arg (call, 2))) + if (flag_openacc + && integer_minus_onep (gimple_call_arg (call, 2))) remove = true; else if (!targetm.goacc.fork_join (call, dims, kind == IFN_UNIQUE_OACC_FORK)) @@ -2616,7 +2899,8 @@ public: {} /* opt_pass methods: */ - bool gate (function *) final override { return flag_openacc; }; + bool gate (function *) final override + { return flag_openacc || (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC); }; unsigned int execute (function *) final override { diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index d972bb7eafd..92d0231d04d 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -32,5 +32,6 @@ extern GTY(()) vec<tree, va_gc> *offload_ind_funcs; extern void omp_finish_file (void); extern void omp_discover_implicit_declare_target (void); +extern void omp_ompacc_attribute_tagging (void); #endif /* GCC_OMP_DEVICE_H */ diff --git a/gcc/opts.cc b/gcc/opts.cc index 3333600e0ea..badd1f3e445 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -1461,6 +1461,14 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, " %<-fstrict-flex-arrays%> is not present"); } + if (opts_set->x_flag_openmp_target) + { + if (opts->x_flag_openacc) + error ("%<-fopenacc%> not compatible with %<-fopenmp-target=%>"); + if (!opts->x_flag_openmp) + error ("%<-fopenmp-target=%> requires %<-fopenmp%> setting"); + } + diagnose_options (opts, opts_set, loc); } diff --git a/gcc/target-insns.def b/gcc/target-insns.def index 74efb0a70c1..2b5f1202a33 100644 --- a/gcc/target-insns.def +++ b/gcc/target-insns.def @@ -68,6 +68,11 @@ DEF_TARGET_INSN (oacc_dim_pos, (rtx x0, rtx x1)) DEF_TARGET_INSN (oacc_dim_size, (rtx x0, rtx x1)) DEF_TARGET_INSN (oacc_fork, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (oacc_join, (rtx x0, rtx x1, rtx x2)) +DEF_TARGET_INSN (gomp_barrier, (void)) +DEF_TARGET_INSN (omp_get_thread_num, (rtx x0)) +DEF_TARGET_INSN (omp_get_num_threads, (rtx x0)) +DEF_TARGET_INSN (omp_get_team_num, (rtx x0)) +DEF_TARGET_INSN (omp_get_num_teams, (rtx x0)) DEF_TARGET_INSN (omp_simt_enter, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (omp_simt_exit, (rtx x0)) DEF_TARGET_INSN (omp_simt_lane, (rtx x0)) diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 749341e7782..1ca18257316 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -515,6 +515,10 @@ enum omp_clause_code { loop or not. */ OMP_CLAUSE__SIMT_, + /* Internally used only clause, flag whether this is an "ompacc" + target region or not. */ + OMP_CLAUSE__OMPACC_, + /* OpenACC clause: independent. */ OMP_CLAUSE_INDEPENDENT, diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc index 4e5f3be7676..b13c036e2b0 100644 --- a/gcc/tree-nested.cc +++ b/gcc/tree-nested.cc @@ -1518,6 +1518,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_BIND: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE__SCANTEMP_: + case OMP_CLAUSE__OMPACC_: break; /* The following clause belongs to the OpenACC cache directive, which @@ -2303,6 +2304,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_BIND: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE__SCANTEMP_: + case OMP_CLAUSE__OMPACC_: break; /* The following clause belongs to the OpenACC cache directive, which diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index f7439e2f597..f7be9347de5 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -1386,6 +1386,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) pp_string (pp, "_simt_"); break; + case OMP_CLAUSE__OMPACC_: + pp_string (pp, "_ompacc_"); + if (OMP_CLAUSE__OMPACC__SEQ (clause)) + pp_string (pp, "(seq)"); + break; + case OMP_CLAUSE_GANG: pp_string (pp, "gang"); if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE) diff --git a/gcc/tree.cc b/gcc/tree.cc index e234d4a936a..c8cf45e3fc1 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -321,6 +321,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_SIZES */ 1, /* OMP_CLAUSE__SIMDUID_ */ 0, /* OMP_CLAUSE__SIMT_ */ + 0, /* OMP_CLAUSE__OMPACC_ */ 0, /* OMP_CLAUSE_INDEPENDENT */ 1, /* OMP_CLAUSE_WORKER */ 1, /* OMP_CLAUSE_VECTOR */ @@ -418,6 +419,7 @@ const char * const omp_clause_code_name[] = "sizes", "_simduid_", "_simt_", + "_ompacc_", "independent", "worker", "vector", diff --git a/gcc/tree.h b/gcc/tree.h index aacdbc8b078..7dfdc289f14 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -2025,6 +2025,9 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE__SIMDUID__DECL(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__SIMDUID_), 0) +#define OMP_CLAUSE__OMPACC__SEQ(NODE) \ + (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__OMPACC_)->base.public_flag) + #define OMP_CLAUSE_SCHEDULE_KIND(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->omp_clause.subcode.schedule_kind) diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c index d5361917a24..82dec209f6e 100644 --- a/libgomp/config/nvptx/team.c +++ b/libgomp/config/nvptx/team.c @@ -34,6 +34,9 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon)); int __gomp_team_num __attribute__((shared,nocommon)); +/* Number of active target threads in team, used in ACC mode. */ +unsigned int __nvptx_omp_num_threads __attribute__((shared,nocommon)); + static void gomp_thread_start (struct gomp_thread_pool *); extern void build_indirect_map (void); diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-17.c b/libgomp/testsuite/libgomp.c-c++-common/for-17.c new file mode 100644 index 00000000000..9771aaf2ab5 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/for-17.c @@ -0,0 +1,69 @@ +/* { dg-options "-fopenmp-target=acc" } */ +/* { dg-additional-options "-std=gnu99" { target c } } */ + +#define M(x, y, z) O(x, y, z) +#define O(x, y, z) x ## _ ## y ## _ ## z + +#define DO_PRAGMA(x) _Pragma (#x) + +#undef OMPFROM +#undef OMPTO +#define OMPFROM(v) DO_PRAGMA (omp target update from(v)) +#define OMPTO(v) DO_PRAGMA (omp target update to(v)) + +#pragma omp declare target + +#define OMPTGT DO_PRAGMA (omp target) +#define F parallel for +#define G pf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G +#undef OMPTGT + +#pragma omp end declare target + +#define F target parallel for +#define G tpf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +#define F target teams distribute +#define G ttd +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +#define F target teams distribute parallel for +#define G ttdpf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +int +main () +{ + if (test_pf_ompacc () + || test_tpf_ompacc () + || test_ttd_ompacc () + || test_ttdpf_ompacc ()) + __builtin_abort (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-18.c b/libgomp/testsuite/libgomp.c-c++-common/for-18.c new file mode 100644 index 00000000000..2486d3aa665 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/for-18.c @@ -0,0 +1,5 @@ +/* { dg-options "-fopenmp-target=acc" } */ +/* { dg-additional-options "-std=gnu99" {target c } } */ + +#define CONDNE +#include "for-17.c"