Hi! Ping.
On Thu, 04 Feb 2016 15:47:25 +0100, I wrote: > Here is the patch re-worked for trunk. Instead of passing > -foffload-force in the affected libgomp test cases, I instead chose to > have them expect the warning. This way, we're testing more in line to > what users will be doing, and we'll notice how the OpenACC kernels > handling improves, when parloops gets able to parallelize more offloaded > code (and the "avoid offloading" handling will no longer trigger). OK to > commit? > > commit acd66946777671486a0f69706b25a3ec5f877306 > Author: Thomas Schwinge <tho...@codesourcery.com> > Date: Tue Feb 2 20:41:42 2016 +0100 > > Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid > offloading" > > gcc/ > * common.opt: Add -foffload-force. > * lto-wrapper.c (merge_and_complain, append_compiler_options): > Handle it. > * doc/invoke.texi: Document it. > * config/nvptx/mkoffload.c (struct id_map): Add "flags" member. > (record_id): Parse, and set it. > (process): Use it. > * config/nvptx/nvptx.c (nvptx_attribute_table): Add "omp avoid > offloading". > (nvptx_record_offload_symbol): Use it. > (nvptx_goacc_validate_dims): Set it. > libgomp/ > * libgomp.h (gomp_offload_target_available_p): New function > declaration. > * target.c (gomp_offload_target_available_p): New function > definition. > (GOMP_offload_register_ver, GOMP_offload_unregister_ver) > (gomp_init_device, gomp_unload_device): Handle and document "avoid > offloading" flag ("host_table == NULL"). > (resolve_device): Document "avoid offloading". > * oacc-init.c (resolve_device): Likewise. > * libgomp.texi (Enabling OpenACC): Likewise. > * testsuite/lib/libgomp.exp > (check_effective_target_nvptx_offloading_configured): New proc > definition. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c: New > file. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c: > Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise. > * testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise. > * testsuite/libgomp.oacc-c-c++-common/abort-3.c: Expect warning. > * testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c: > Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c: > Likewise. > * testsuite/libgomp.oacc-fortran/combined-directives-1.f90: > Likewise. > * testsuite/libgomp.oacc-fortran/non-scalar-data.f90: Likewise. > > libgomp/ > * testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c: Set > "-ftree-parallelize-loops=32". > * testsuite/libgomp.oacc-c-c++-common/default-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Likewise. > * testsuite/libgomp.oacc-c-c++-common/nested-2.c: Likewise. > --- > gcc/common.opt | 4 + > gcc/config/nvptx/mkoffload.c | 73 +++++++++++- > gcc/config/nvptx/nvptx.c | 42 ++++++- > gcc/doc/invoke.texi | 12 +- > gcc/lto-wrapper.c | 2 + > libgomp/libgomp.h | 1 + > libgomp/libgomp.texi | 8 ++ > libgomp/oacc-init.c | 19 ++- > libgomp/target.c | 122 > ++++++++++++++++---- > libgomp/testsuite/lib/libgomp.exp | 10 ++ > .../testsuite/libgomp.oacc-c-c++-common/abort-3.c | 4 +- > .../testsuite/libgomp.oacc-c-c++-common/abort-4.c | 4 +- > .../libgomp.oacc-c-c++-common/avoid-offloading-1.c | 28 +++++ > .../libgomp.oacc-c-c++-common/avoid-offloading-2.c | 38 ++++++ > .../libgomp.oacc-c-c++-common/avoid-offloading-3.c | 29 +++++ > .../combined-directives-1.c | 4 +- > .../libgomp.oacc-c-c++-common/default-1.c | 4 +- > .../libgomp.oacc-c-c++-common/deviceptr-1.c | 4 +- > .../libgomp.oacc-c-c++-common/host_data-1.c | 1 + > .../libgomp.oacc-c-c++-common/kernels-1.c | 10 +- > .../kernels-alias-ipa-pta-2.c | 4 +- > .../kernels-alias-ipa-pta-3.c | 4 +- > .../kernels-alias-ipa-pta.c | 4 +- > .../libgomp.oacc-c-c++-common/kernels-empty.c | 2 +- > .../kernels-loop-and-seq-2.c | 3 +- > .../kernels-loop-and-seq-3.c | 4 +- > .../kernels-loop-and-seq-4.c | 3 +- > .../kernels-loop-and-seq-5.c | 3 +- > .../kernels-loop-and-seq-6.c | 3 +- > .../kernels-loop-and-seq.c | 4 +- > .../kernels-loop-collapse.c | 3 +- > .../testsuite/libgomp.oacc-c-c++-common/nested-2.c | 2 +- > .../libgomp.oacc-fortran/avoid-offloading-1.f | 32 +++++ > .../libgomp.oacc-fortran/avoid-offloading-2.f | 41 +++++++ > .../libgomp.oacc-fortran/avoid-offloading-3.f | 31 +++++ > .../libgomp.oacc-fortran/combined-directives-1.f90 | 5 +- > .../libgomp.oacc-fortran/non-scalar-data.f90 | 5 +- > 37 files changed, 494 insertions(+), 78 deletions(-) > > diff --git gcc/common.opt gcc/common.opt > index 520fa9c..2cf798d 100644 > --- gcc/common.opt > +++ gcc/common.opt > @@ -1779,6 +1779,10 @@ Enum(offload_abi) String(ilp32) > Value(OFFLOAD_ABI_ILP32) > EnumValue > Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) > > +foffload-force > +Common Var(flag_offload_force) > +Force offloading if the compiler wanted to avoid it. > + > fomit-frame-pointer > Common Report Var(flag_omit_frame_pointer) Optimization > When possible do not generate stack frames. > diff --git gcc/config/nvptx/mkoffload.c gcc/config/nvptx/mkoffload.c > index c8eed45..586ee8b 100644 > --- gcc/config/nvptx/mkoffload.c > +++ gcc/config/nvptx/mkoffload.c > @@ -41,9 +41,19 @@ const char tool_name[] = "nvptx mkoffload"; > > #define COMMENT_PREFIX "#" > > +enum id_map_flag > + { > + /* All clear. */ > + ID_MAP_FLAG_NONE = 0, > + /* Avoid offloading. For example, because there is no sufficient > + parallelism. */ > + ID_MAP_FLAG_AVOID_OFFLOADING = 1 > + }; > + > struct id_map > { > id_map *next; > + int flags; > char *ptx_name; > }; > > @@ -107,6 +117,38 @@ record_id (const char *p1, id_map ***where) > fatal_error (input_location, "malformed ptx file"); > > id_map *v = XNEW (id_map); > + > + /* Do we have any flags? */ > + v->flags = ID_MAP_FLAG_NONE; > + if (p1[0] == '(') > + { > + /* Current flag. */ > + const char *cur = p1 + 1; > + > + /* Seek to the beginning of ") ". */ > + p1 = strchr (cur, ')'); > + if (!p1 || p1 > end || p1[1] != ' ') > + fatal_error (input_location, "malformed ptx file: " > + "expected \") \" at \"%s\"", cur); > + > + while (cur < p1) > + { > + const char *next = strchr (cur, ','); > + if (!next || next > p1) > + next = p1; > + > + if (strncmp (cur, "avoid offloading", next - cur - 1) == 0) > + v->flags |= ID_MAP_FLAG_AVOID_OFFLOADING; > + else > + fatal_error (input_location, "malformed ptx file: " > + "unknown flag at \"%s\"", cur); > + > + cur = next; > + } > + > + /* Skip past ") ". */ > + p1 += 2; > + } > size_t len = end - p1; > v->ptx_name = XNEWVEC (char, len + 1); > memcpy (v->ptx_name, p1, len); > @@ -296,12 +338,17 @@ process (FILE *in, FILE *out) > fprintf (out, "\n};\n\n"); > > /* Dump out function idents. */ > + bool avoid_offloading_p = false; > fprintf (out, "static const struct nvptx_fn {\n" > " const char *name;\n" > " unsigned short dim[%d];\n" > "} func_mappings[] = {\n", GOMP_DIM_MAX); > for (comma = "", id = func_ids; id; comma = ",", id = id->next) > - fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); > + { > + if (id->flags & ID_MAP_FLAG_AVOID_OFFLOADING) > + avoid_offloading_p = true; > + fprintf (out, "%s\n\t{%s}", comma, id->ptx_name); > + } > fprintf (out, "\n};\n\n"); > > fprintf (out, > @@ -318,7 +365,11 @@ process (FILE *in, FILE *out) > " sizeof (var_mappings) / sizeof (var_mappings[0]),\n" > " func_mappings," > " sizeof (func_mappings) / sizeof (func_mappings[0])\n" > - "};\n\n"); > + "};\n"); > + if (avoid_offloading_p) > + /* Need a unique handle for target_data. */ > + fprintf (out, "static int target_data_avoid_offloading;\n"); > + fprintf (out, "\n"); > > fprintf (out, "#ifdef __cplusplus\n" > "extern \"C\" {\n" > @@ -338,18 +389,28 @@ process (FILE *in, FILE *out) > fprintf (out, "static __attribute__((constructor)) void init (void)\n" > "{\n" > " GOMP_offload_register_ver (%#x, __OFFLOAD_TABLE__," > - "%d/*NVIDIA_PTX*/, &target_data);\n" > - "};\n", > + "%d/*NVIDIA_PTX*/, &target_data);\n", > GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > GOMP_DEVICE_NVIDIA_PTX); > + if (avoid_offloading_p) > + fprintf (out, " GOMP_offload_register_ver (%#x, (void *) 0," > + "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n", > + GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > + GOMP_DEVICE_NVIDIA_PTX); > + fprintf (out, "};\n"); > > fprintf (out, "static __attribute__((destructor)) void fini (void)\n" > "{\n" > " GOMP_offload_unregister_ver (%#x, __OFFLOAD_TABLE__," > - "%d/*NVIDIA_PTX*/, &target_data);\n" > - "};\n", > + "%d/*NVIDIA_PTX*/, &target_data);\n", > GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > GOMP_DEVICE_NVIDIA_PTX); > + if (avoid_offloading_p) > + fprintf (out, " GOMP_offload_unregister_ver (%#x, (void *) 0," > + "%d/*NVIDIA_PTX*/, &target_data_avoid_offloading);\n", > + GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_NVIDIA_PTX), > + GOMP_DEVICE_NVIDIA_PTX); > + fprintf (out, "};\n"); > } > > static void > diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c > index 78614f8..fe28154 100644 > --- gcc/config/nvptx/nvptx.c > +++ gcc/config/nvptx/nvptx.c > @@ -3803,6 +3803,9 @@ static const struct attribute_spec > nvptx_attribute_table[] = > /* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler, > affects_type_identity } */ > { "kernel", 0, 0, true, false, false, nvptx_handle_kernel_attribute, > false }, > + /* Avoid offloading. For example, because there is no sufficient > + parallelism. */ > + { "omp avoid offloading", 0, 0, true, false, false, NULL, false }, > { NULL, 0, 0, false, false, false, NULL, false } > }; > > @@ -3867,7 +3870,10 @@ nvptx_record_offload_symbol (tree decl) > tree dims = TREE_VALUE (attr); > unsigned ix; > > - fprintf (asm_out_file, "//:FUNC_MAP \"%s\"", > + fprintf (asm_out_file, "//:FUNC_MAP %s\"%s\"", > + (lookup_attribute ("omp avoid offloading", > + DECL_ATTRIBUTES (decl)) > + ? "(avoid offloading) " : ""), > IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); > > for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims)) > @@ -4124,6 +4130,40 @@ nvptx_expand_builtin (tree exp, rtx target, rtx > ARG_UNUSED (subtarget), > static bool > nvptx_goacc_validate_dims (tree decl, int dims[], int fn_level) > { > + /* Detect if a function is unsuitable for offloading. */ > + if (!flag_offload_force && decl) > + { > + tree oacc_function_attr = get_oacc_fn_attrib (decl); > + if (oacc_function_attr > + && oacc_fn_attrib_kernels_p (oacc_function_attr)) > + { > + bool avoid_offloading_p = true; > + for (unsigned ix = 0; ix != GOMP_DIM_MAX; ix++) > + { > + if (dims[ix] > 1) > + { > + avoid_offloading_p = false; > + break; > + } > + } > + if (avoid_offloading_p) > + { > + /* OpenACC kernels constructs will never be parallelized for > + optimization levels smaller than -O2; avoid the diagnostic in > + this case. */ > + if (optimize >= 2) > + warning_at (DECL_SOURCE_LOCATION (decl), 0, > + "OpenACC kernels construct will be executed " > + "sequentially; will by default avoid offloading " > + "to prevent data copy penalty"); > + DECL_ATTRIBUTES (decl) > + = tree_cons (get_identifier ("omp avoid offloading"), > + NULL_TREE, DECL_ATTRIBUTES (decl)); > + > + } > + } > + } > + > bool changed = false; > > /* The vector size must be 32, unless this is a SEQ routine. */ > diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi > index fcc404e..c09fbc5 100644 > --- gcc/doc/invoke.texi > +++ gcc/doc/invoke.texi > @@ -180,7 +180,8 @@ in the following sections. > @gccoptlist{-ansi -std=@var{standard} -fgnu89-inline @gol > -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol > -fno-asm -fno-builtin -fno-builtin-@var{function} @gol > --fhosted -ffreestanding -fopenacc -fopenmp -fopenmp-simd @gol > +-fhosted -ffreestanding @gol > +-foffload-force -fopenacc -fopenacc-dim=@var{geom} -fopenmp -fopenmp-simd > @gol > -fms-extensions -fplan9-extensions -fsso-struct=@var{endianness} > -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol > -fsigned-bitfields -fsigned-char @gol > @@ -1953,6 +1954,15 @@ This is equivalent to @option{-fno-hosted}. > @xref{Standards,,Language Standards Supported by GCC}, for details of > freestanding and hosted environments. > > +@item -foffload-force > +@opindex -foffload-force > +The option @option{-foffload-force} forces offloading if the compiler > +wanted to avoid it. For example, when there isn't sufficient > +parallelism in certain offloading constructs, the compiler may come to > +the conclusion that offloading incurs too much overhead (for data > +transfers, for example), and unless overridden with this flag, it then > +suggests to the runtime (libgomp) to avoid offloading. > + > @item -fopenacc > @opindex fopenacc > @cindex OpenACC accelerator programming > diff --git gcc/lto-wrapper.c gcc/lto-wrapper.c > index ced6f2f..702ae47 100644 > --- gcc/lto-wrapper.c > +++ gcc/lto-wrapper.c > @@ -275,6 +275,7 @@ merge_and_complain (struct cl_decoded_option > **decoded_options, > case OPT_fsigned_zeros: > case OPT_ftrapping_math: > case OPT_fwrapv: > + case OPT_foffload_force: > case OPT_fopenmp: > case OPT_fopenacc: > case OPT_fcilkplus: > @@ -517,6 +518,7 @@ append_compiler_options (obstack *argv_obstack, struct > cl_decoded_option *opts, > case OPT_fsigned_zeros: > case OPT_ftrapping_math: > case OPT_fwrapv: > + case OPT_foffload_force: > case OPT_fopenmp: > case OPT_fopenacc: > case OPT_fopenacc_dim_: > diff --git libgomp/libgomp.h libgomp/libgomp.h > index 7108a6d..8747b72 100644 > --- libgomp/libgomp.h > +++ libgomp/libgomp.h > @@ -984,6 +984,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, > bool); > extern void gomp_init_device (struct gomp_device_descr *); > extern void gomp_free_memmap (struct splay_tree_s *); > extern void gomp_unload_device (struct gomp_device_descr *); > +extern bool gomp_offload_target_available_p (int); > > /* work.c */ > > diff --git libgomp/libgomp.texi libgomp/libgomp.texi > index 987ee5f..5795c00 100644 > --- libgomp/libgomp.texi > +++ libgomp/libgomp.texi > @@ -1815,6 +1815,14 @@ flag @option{-fopenacc} must be specified. This > enables the OpenACC directive > arranges for automatic linking of the OpenACC runtime library > (@ref{OpenACC Runtime Library Routines}). > > +Offloading is enabled by default. In some cases, the compiler may > +come to the conclusion that offloading incurs too much overhead, and > +suggest to the runtime to avoid it. To counteract that, you can use > +the option @option{-foffload-force} to force offloading in such cases. > +Alternatively, offloading is also enabled if a specific device type is > +requested, in a call to @code{acc_init} or by setting the > +@env{ACC_DEVICE_TYPE} environment variable, for example. > + > A complete description of all OpenACC directives accepted may be found in > the @uref{http://www.openacc.org/, OpenACC} Application Programming > Interface manual, version 2.0. > diff --git libgomp/oacc-init.c libgomp/oacc-init.c > index 42d005d..2f053f3 100644 > --- libgomp/oacc-init.c > +++ libgomp/oacc-init.c > @@ -122,7 +122,10 @@ resolve_device (acc_device_t d, bool fail_is_error) > { > if (goacc_device_type) > { > - /* Lookup the named device. */ > + /* Lookup the device that has been explicitly named, so do not pay > + attention to gomp_offload_target_available_p. (That is, > + enforced usage even with an "avoid offloading" flag set, and > + hard error if not actually available.) */ > while (++d != _ACC_device_hwm) > if (dispatchers[d] > && !strcasecmp (goacc_device_type, > @@ -148,8 +151,15 @@ resolve_device (acc_device_t d, bool fail_is_error) > case acc_device_not_host: > /* Find the first available device after acc_device_not_host. */ > while (++d != _ACC_device_hwm) > - if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0) > + if (dispatchers[d] > + && dispatchers[d]->get_num_devices_func () > 0 > + /* No device has been explicitly named, so pay attention to > + gomp_offload_target_available_p, to not decide on an offload > + target that we don't have offload data available for, or have an > + "avoid offloading" flag set for. */ > + && gomp_offload_target_available_p (dispatchers[d]->type)) > goto found; > + /* No non-host device found. */ > if (d_arg == acc_device_default) > { > d = acc_device_host; > @@ -168,7 +178,7 @@ resolve_device (acc_device_t d, bool fail_is_error) > break; > > default: > - if (d > _ACC_device_hwm) > + if (d >= _ACC_device_hwm) > { > if (fail_is_error) > goto unsupported_device; > @@ -181,7 +191,8 @@ resolve_device (acc_device_t d, bool fail_is_error) > > assert (d != acc_device_none > && d != acc_device_default > - && d != acc_device_not_host); > + && d != acc_device_not_host > + && d < _ACC_device_hwm); > > if (dispatchers[d] == NULL && fail_is_error) > { > diff --git libgomp/target.c libgomp/target.c > index 96fe3d5..afcbedb 100644 > --- libgomp/target.c > +++ libgomp/target.c > @@ -1165,12 +1165,19 @@ gomp_unload_image_from_device (struct > gomp_device_descr *devicep, > > /* This function should be called from every offload image while loading. > It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of > - the target, and TARGET_DATA needed by target plugin. */ > + the target, and TARGET_DATA needed by target plugin. > + > + If HOST_TABLE is NULL, this image (TARGET_DATA) is stored as an "avoid > + offloading" flag, and the TARGET_TYPE will not be considered by default > + until this image gets unregistered. */ > > void > GOMP_offload_register_ver (unsigned version, const void *host_table, > int target_type, const void *target_data) > { > + gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__, > + version, host_table, target_type, target_data); > + > int i; > > if (GOMP_VERSION_LIB (version) > GOMP_VERSION) > @@ -1179,16 +1186,19 @@ GOMP_offload_register_ver (unsigned version, const > void *host_table, > > gomp_mutex_lock (®ister_lock); > > - /* Load image to all initialized devices. */ > - for (i = 0; i < num_devices; i++) > + if (host_table != NULL) > { > - struct gomp_device_descr *devicep = &devices[i]; > - gomp_mutex_lock (&devicep->lock); > - if (devicep->type == target_type > - && devicep->state == GOMP_DEVICE_INITIALIZED) > - gomp_load_image_to_device (devicep, version, > - host_table, target_data, true); > - gomp_mutex_unlock (&devicep->lock); > + /* Load image to all initialized devices. */ > + for (i = 0; i < num_devices; i++) > + { > + struct gomp_device_descr *devicep = &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type == target_type > + && devicep->state == GOMP_DEVICE_INITIALIZED) > + gomp_load_image_to_device (devicep, version, > + host_table, target_data, true); > + gomp_mutex_unlock (&devicep->lock); > + } > } > > /* Insert image to array of pending images. */ > @@ -1214,26 +1224,36 @@ GOMP_offload_register (const void *host_table, int > target_type, > > /* This function should be called from every offload image while unloading. > It gets the descriptor of the host func and var tables HOST_TABLE, TYPE of > - the target, and TARGET_DATA needed by target plugin. */ > + the target, and TARGET_DATA needed by target plugin. > + > + If HOST_TABLE is NULL, the "avoid offloading" flag gets cleared for this > + image (TARGET_DATA), and this TARGET_TYPE may again be considered by > + default. */ > > void > GOMP_offload_unregister_ver (unsigned version, const void *host_table, > int target_type, const void *target_data) > { > + gomp_debug (0, "%s (%u, %p, %d, %p)\n", __FUNCTION__, > + version, host_table, target_type, target_data); > + > int i; > > gomp_mutex_lock (®ister_lock); > > - /* Unload image from all initialized devices. */ > - for (i = 0; i < num_devices; i++) > + if (host_table != NULL) > { > - struct gomp_device_descr *devicep = &devices[i]; > - gomp_mutex_lock (&devicep->lock); > - if (devicep->type == target_type > - && devicep->state == GOMP_DEVICE_INITIALIZED) > - gomp_unload_image_from_device (devicep, version, > - host_table, target_data); > - gomp_mutex_unlock (&devicep->lock); > + /* Unload image from all initialized devices. */ > + for (i = 0; i < num_devices; i++) > + { > + struct gomp_device_descr *devicep = &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type == target_type > + && devicep->state == GOMP_DEVICE_INITIALIZED) > + gomp_unload_image_from_device (devicep, version, > + host_table, target_data); > + gomp_mutex_unlock (&devicep->lock); > + } > } > > /* Remove image from array of pending images. */ > @@ -1267,7 +1287,8 @@ gomp_init_device (struct gomp_device_descr *devicep) > for (i = 0; i < num_offload_images; i++) > { > struct offload_image_descr *image = &offload_images[i]; > - if (image->type == devicep->type) > + if (image->type == devicep->type > + && image->host_table != NULL) > gomp_load_image_to_device (devicep, image->version, > image->host_table, image->target_data, > false); > @@ -1287,7 +1308,8 @@ gomp_unload_device (struct gomp_device_descr *devicep) > for (i = 0; i < num_offload_images; i++) > { > struct offload_image_descr *image = &offload_images[i]; > - if (image->type == devicep->type) > + if (image->type == devicep->type > + && image->host_table != NULL) > gomp_unload_image_from_device (devicep, image->version, > image->host_table, > image->target_data); > @@ -1311,6 +1333,62 @@ gomp_free_memmap (struct splay_tree_s *mem_map) > } > } > > +/* Do we have offload data available for the given offload target type? > + Instead of verifying that *all* offload data is available that could > + possibly be required, we instead just look for *any*. If we later find > any > + offload data missing, that's user error. If any offload data of this > target > + type is tagged with an "avoid offloading" flag, do not consider this > target > + type available unless it has been initialized already. */ > + > +attribute_hidden bool > +gomp_offload_target_available_p (int type) > +{ > + bool available = false; > + > + /* Has the offload target type already been initialized? */ > + for (int i = 0; !available && i < num_devices; i++) > + { > + struct gomp_device_descr *devicep = &devices[i]; > + gomp_mutex_lock (&devicep->lock); > + if (devicep->type == type > + && devicep->state == GOMP_DEVICE_INITIALIZED) > + available = true; > + gomp_mutex_unlock (&devicep->lock); > + } > + > + /* If the offload target type has been initialized already, we ignore > "avoid > + offloading" flags. This is important, because data/state may be present > + on the device, that we must continue to use. */ > + if (!available) > + { > + gomp_mutex_lock (®ister_lock); > + if (num_offload_images == 0) > + /* If there is no offload data available at all, there is no way to > + later fail to find any of it for a specific offload target type. > + This is the case where there are no offloaded code regions in user > + code, but the target type can be initialized successfully, and > + executable directqives be used, or runtime library calls be > + made. */ > + available = true; > + else > + { > + /* Can the offload target be initialized? */ > + for (int i = 0; !available && i < num_offload_images; i++) > + if (offload_images[i].type == type > + && offload_images[i].host_table != NULL) > + available = true; > + /* If yes, is an "avoid offloading" flag set? */ > + for (int i = 0; available && i < num_offload_images; i++) > + if (offload_images[i].type == type > + && offload_images[i].host_table == NULL) > + available = false; > + } > + gomp_mutex_unlock (®ister_lock); > + } > + > + return available; > +} > + > /* Host fallback for GOMP_target{,_ext} routines. */ > > static void > diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp > index a4c9d83..8d2be80 100644 > --- libgomp/testsuite/lib/libgomp.exp > +++ libgomp/testsuite/lib/libgomp.exp > @@ -344,6 +344,16 @@ proc check_effective_target_offload_device_nonshared_as > { } { > } ] > } > > +# Return 1 if the compiler has been configured for nvptx offloading. > + > +proc check_effective_target_nvptx_offloading_configured { } { > + # PR libgomp/65099: Currently, we only support offloading in 64-bit > + # configurations. > + global offload_targets > + return [expr [string match "*,nvptx,*" ",$offload_targets,"] \ > + && [is-effective-target lp64] ] > +} > + > # Return 1 if at least one nvidia board is present. > > proc check_effective_target_openacc_nvidia_accel_present { } { > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > index bca425e..23156d8 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c > @@ -1,5 +1,3 @@ > -/* { dg-do run } */ > - > #include <stdio.h> > #include <stdlib.h> > > @@ -7,7 +5,7 @@ int > main (void) > { > fprintf (stderr, "CheCKpOInT\n"); > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > { > abort (); > } > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > index c29ca3f..f4d6a07 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c > @@ -1,12 +1,10 @@ > -/* { dg-do run } */ > - > #include <stdlib.h> > > int > main (int argc, char **argv) > { > > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > { > if (argc != 1) > abort (); > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c > new file mode 100644 > index 0000000..08745fc > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c > @@ -0,0 +1,28 @@ > +/* Test that the compiler decides to "avoid offloading". */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > +/* The ACC_DEVICE_TYPE environment variable gets set in the testing > + framework, and that overrides the "avoid offloading" flag at run time. > + { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } */ > + > +#include <openacc.h> > + > +int main(void) > +{ > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > + *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); > + > + if (x != 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia > + if (y != 1) > + __builtin_abort(); > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + > + return 0; > +} > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c > libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c > new file mode 100644 > index 0000000..724228a > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c > @@ -0,0 +1,38 @@ > +/* Test that a user can override the compiler's "avoid offloading" > + decision at run time. */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > + > +#include <openacc.h> > + > +int main(void) > +{ > + /* Override the compiler's "avoid offloading" decision. */ > + acc_device_t d; > +#if defined ACC_DEVICE_TYPE_nvidia > + d = acc_device_nvidia; > +#elif defined ACC_DEVICE_TYPE_host > + d = acc_device_host; > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + acc_init (d); > + > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > + *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); > + > + if (x != 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_nvidia > + if (y != 0) > + __builtin_abort(); > +#else > + if (y != 1) > + __builtin_abort(); > +#endif > + > + return 0; > +} > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c > libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c > new file mode 100644 > index 0000000..2fb5196 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c > @@ -0,0 +1,29 @@ > +/* Test that a user can override the compiler's "avoid offloading" > + decision at compile time. */ > + > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > +/* Override the compiler's "avoid offloading" decision. > + { dg-additional-options "-foffload-force" } */ > + > +#include <openacc.h> > + > +int main(void) > +{ > + int x, y; > + > +#pragma acc data copyout(x, y) > +#pragma acc kernels > + *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); > + > + if (x != 33) > + __builtin_abort(); > +#if defined ACC_DEVICE_TYPE_nvidia > + if (y != 0) > + __builtin_abort(); > +#else > + if (y != 1) > + __builtin_abort(); > +#endif > + > + return 0; > +} > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > index dad6d13..87ca378 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c > @@ -1,6 +1,6 @@ > /* This test exercises combined directives. */ > > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > > @@ -33,7 +33,7 @@ main (int argc, char **argv) > abort (); > } > > -#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) > +#pragma acc kernels loop copy (a[0:N]) copy (b[0:N]) /* { dg-warning > "OpenACC kernels construct will be executed sequentially; will by default > avoid offloading to prevent data copy penalty" "" { target > nvptx_offloading_configured } } */ > for (i = 0; i < N; i++) > { > b[i] = 3.0; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > index 1ac0b95..8f0144c 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/default-1.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <openacc.h> > > @@ -51,7 +51,7 @@ int test_kernels () > ary[i] = ~0; > > /* val defaults to copy, ary defaults to copy. */ > -#pragma acc kernels copy(ondev) > +#pragma acc kernels copy(ondev) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > ondev = acc_on_device (acc_device_not_host); > #pragma acc loop > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > index e271a37..9a5f7b1 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c > @@ -1,5 +1,3 @@ > -/* { dg-do run } */ > - > #include <stdlib.h> > > int main (void) > @@ -10,7 +8,7 @@ int main (void) > a = A; > > #pragma acc data copyout (a_1, a_2) > -#pragma acc kernels deviceptr (a) > +#pragma acc kernels deviceptr (a) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a_1 = a; > a_2 = &a; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > index 51745ba..3ef6f9b 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c > @@ -1,4 +1,5 @@ > /* { dg-do run { target openacc_nvidia_accel_selected } } */ > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > /* { dg-additional-options "-lcuda -lcublas -lcudart" } */ > > #include <stdlib.h> > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > index 3acfdf5..614ad33 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > > @@ -73,7 +73,7 @@ int main (void) > i = -1; > j = -2; > v = 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin > (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin > (i, j) /* { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } */ > { > if (i != -1 || j != -2) > abort (); > @@ -96,7 +96,7 @@ int main (void) > i = -1; > j = -2; > v = 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout > (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout > (i, j) /* { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } */ > { > i = 2; > j = 1; > @@ -110,7 +110,7 @@ int main (void) > i = -1; > j = -2; > v = 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, > j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, > j) /* { dg-warning "OpenACC kernels construct will be executed sequentially; > will by default avoid offloading to prevent data copy penalty" "" { target > nvptx_offloading_configured } } */ > { > if (i != -1 || j != -2) > abort (); > @@ -126,7 +126,7 @@ int main (void) > i = -1; > j = -2; > v = 0; > -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create > (i, j) > +#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create > (i, j) /* { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } */ > { > i = 2; > j = 1; > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > index 0f323c8..8d5101d 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ > > #include <stdlib.h> > > @@ -11,7 +11,7 @@ main (void) > unsigned int *b = (unsigned int *)malloc (N * sizeof (unsigned int)); > unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); > > -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) > +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning > "OpenACC kernels construct will be executed sequentially; will by default > avoid offloading to prevent data copy penalty" "" { target > nvptx_offloading_configured } } */ > { > a[0] = 0; > b[0] = 1; > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > index 654e750..3726b0c 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ > > #include <stdlib.h> > > @@ -11,7 +11,7 @@ main (void) > unsigned int *b = a; > unsigned int *c = (unsigned int *)malloc (N * sizeof (unsigned int)); > > -#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) > +#pragma acc kernels pcopyout (a[0:N], b[0:N], c[0:N]) /* { dg-warning > "OpenACC kernels construct will be executed sequentially; will by default > avoid offloading to prevent data copy penalty" "" { target > nvptx_offloading_configured } } */ > { > a[0] = 0; > b[0] = 1; > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > index 44d4fd2..eea4f76 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c > @@ -1,4 +1,4 @@ > -/* { dg-additional-options "-O2 -fipa-pta" } */ > +/* { dg-additional-options "-fipa-pta" } */ > > #include <stdlib.h> > > @@ -11,7 +11,7 @@ main (void) > unsigned int b[N]; > unsigned int c[N]; > > -#pragma acc kernels pcopyout (a, b, c) > +#pragma acc kernels pcopyout (a, b, c) /* { dg-warning "OpenACC kernels > construct will be executed sequentially; will by default avoid offloading to > prevent data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] = 0; > b[0] = 1; > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > index a68a7cd..860b6da 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c > @@ -1,6 +1,6 @@ > int > main (void) > { > -#pragma acc kernels > +#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > ; > } > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > index 2e4100f..5cdc200 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] = a[0] + 1; > > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > index b3e736b..2e4d4d2 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -8,8 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > - > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > for (int i = 0; i < n; i++) > a[i] = 1; > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > index 8b9affa..5bf00db 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > a[0] = 2; > > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > index 83d4e7f..d39b667 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -9,7 +8,7 @@ unsigned int > foo (int n, unsigned int *a) > { > int r; > -#pragma acc kernels copyout(r) copy (a[0:N]) > +#pragma acc kernels copyout(r) copy (a[0:N]) /* { dg-warning "OpenACC > kernels construct will be executed sequentially; will by default avoid > offloading to prevent data copy penalty" "" { target > nvptx_offloading_configured } } */ > { > r = a[0]; > > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > index 01d5e5e..bb2e85b 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -8,7 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > int r = a[0]; > > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > index 61d1283..e513827 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -8,8 +7,7 @@ > unsigned int > foo (int n, unsigned int *a) > { > - > -#pragma acc kernels copy (a[0:N]) > +#pragma acc kernels copy (a[0:N]) /* { dg-warning "OpenACC kernels construct > will be executed sequentially; will by default avoid offloading to prevent > data copy penalty" "" { target nvptx_offloading_configured } } */ > { > for (int i = 0; i < n; i++) > a[i] = 1; > diff --git > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > index f7f04cb..c4791a4 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c > @@ -1,4 +1,3 @@ > -/* { dg-do run } */ > /* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > @@ -11,7 +10,7 @@ void __attribute__((noinline, noclone)) > foo (int m, int n) > { > int i, j; > - #pragma acc kernels > + #pragma acc kernels /* { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } */ > { > #pragma acc loop collapse(2) > for (i = 0; i < m; i++) > diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > index c164598..94a5ae2 100644 > --- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c > @@ -1,4 +1,4 @@ > -/* { dg-do run } */ > +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ > > #include <stdlib.h> > > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f > libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f > new file mode 100644 > index 0000000..5f18b94 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f > @@ -0,0 +1,32 @@ > +! Test that the compiler decides to "avoid offloading". > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } > } > +! The ACC_DEVICE_TYPE environment variable gets set in the testing > +! framework, and that overrides the "avoid offloading" flag at run time. > +! { dg-xfail-run-if "TODO" { openacc_nvidia_accel_selected } } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } > + X = 33 > + Y = ACC_ON_DEVICE (ACC_DEVICE_HOST); > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia > + IF (.NOT. Y) CALL ABORT > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f > libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f > new file mode 100644 > index 0000000..51801ad > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f > @@ -0,0 +1,41 @@ > +! Test that a user can override the compiler's "avoid offloading" > +! decision at run time. > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } > } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER :: D > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +! Override the compiler's "avoid offloading" decision. > +#if defined ACC_DEVICE_TYPE_nvidia > + D = ACC_DEVICE_NVIDIA > +#elif defined ACC_DEVICE_TYPE_host > + D = ACC_DEVICE_HOST > +#else > +# error Not ported to this ACC_DEVICE_TYPE > +#endif > + CALL ACC_INIT (D) > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } > + X = 33 > + Y = ACC_ON_DEVICE (ACC_DEVICE_HOST) > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_nvidia > + IF (Y) CALL ABORT > +#else > + IF (.NOT. Y) CALL ABORT > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f > libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f > new file mode 100644 > index 0000000..bea6ab8 > --- /dev/null > +++ libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f > @@ -0,0 +1,31 @@ > +! Test that a user can override the compiler's "avoid offloading" > +! decision at compile time. > + > +! { dg-do run } > +! { dg-additional-options "-cpp" } > +! { dg-additional-options "-ftree-parallelize-loops=32" } > +! Override the compiler's "avoid offloading" decision. > +! { dg-additional-options "-foffload-force" } > + > + IMPLICIT NONE > + INCLUDE "openacc_lib.h" > + > + INTEGER :: D > + INTEGER, VOLATILE :: X > + LOGICAL :: Y > + > +!$ACC DATA COPYOUT(X, Y) > +!$ACC KERNELS > + X = 33 > + Y = ACC_ON_DEVICE (ACC_DEVICE_HOST) > +!$ACC END KERNELS > +!$ACC END DATA > + > + IF (X .NE. 33) CALL ABORT > +#if defined ACC_DEVICE_TYPE_nvidia > + IF (Y) CALL ABORT > +#else > + IF (.NOT. Y) CALL ABORT > +#endif > + > + END > diff --git libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > index 94100b2..4b52579 100644 > --- libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > +++ libgomp/testsuite/libgomp.oacc-fortran/combined-directives-1.f90 > @@ -1,6 +1,9 @@ > ! This test exercises combined directives. > > ! { dg-do run } > +! { dg-additional-options "-ftree-parallelize-loops=32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } > } > > program main > integer, parameter :: n = 32 > @@ -27,7 +30,7 @@ program main > !$acc kernels loop copy (a(1:n)) copy (b(1:n)) > do i = 1, n > b(i) = 3.0; > - a(i) = a(i) + b(i) > + a(i) = a(i) + b(i) ! { dg-warning "OpenACC kernels construct will be > executed sequentially; will by default avoid offloading to prevent data copy > penalty" "" { target nvptx_offloading_configured } } > end do > > do i = 1, n > diff --git libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > index 4afb562..b9298c7 100644 > --- libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > +++ libgomp/testsuite/libgomp.oacc-fortran/non-scalar-data.f90 > @@ -2,6 +2,9 @@ > ! offloaded regions are properly mapped using present_or_copy. > > ! { dg-do run } > +! { dg-additional-options "-ftree-parallelize-loops=32" } > +! The "avoid offloading" warning is only triggered for -O2 and higher. > +! { dg-xfail-if "n/a" { nvptx_offloading_configured } { "-O0" "-O1" } { "" } > } > > program main > implicit none > @@ -30,7 +33,7 @@ subroutine kernels (array, n) > integer, dimension (n) :: array > integer :: n, i > > - !$acc kernels > + !$acc kernels ! { dg-warning "OpenACC kernels construct will be executed > sequentially; will by default avoid offloading to prevent data copy penalty" > "" { target nvptx_offloading_configured } } > do i = 1, n > array(i) = i > end do Grüße Thomas