On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote: > On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iver...@gmail.com> wrote: > > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote: > >> We had established the use of a boolean flag have_offload in gcc::context > >> to indicate whether during compilation, we've actually seen any code to > >> be offloaded (see cited below the relevant parts of the patch by Ilya et > >> al.). This means that currently, the whole offload machinery will not be > >> run unless we actually have any offloaded data. This means that the > >> configured mkoffload programs (-foffload=[...], defaulting to > >> configure-time --enable-offload-targets=[...]) will not be invoked unless > >> we actually have any offloaded data. This means that we will not > >> actually generate constructor code to call libgomp's > >> GOMP_offload_register unless we actually have any offloaded data. > > > > Yes, that was the plan. > > > >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...] > >> targets have been specified during compilation. > >> > >> But: at runtime, I'd like to know which -foffload=[...] targets have been > >> specified during compilation, so that we can, for example, reliably > >> resort to host fallback execution for -foffload=disable instead of > >> getting error message that an offloaded function is missing. > > > > It's easy to fix: > > > > diff --git a/libgomp/target.c b/libgomp/target.c > > index a5fb164..f81d570 100644 > > --- a/libgomp/target.c > > +++ b/libgomp/target.c > > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr > > *devicep, > > k.host_end = k.host_start + 1; > > splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k); > > gomp_mutex_unlock (&devicep->lock); > > - if (tgt_fn == NULL) > > - gomp_fatal ("Target function wasn't mapped"); > > - > > return (void *) tgt_fn->tgt_offset; > > } > > } > > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const > > void *unused, > > return gomp_target_fallback (fn, hostaddrs); > > > > void *fn_addr = gomp_get_target_fn_addr (devicep, fn); > > + if (fn_addr == NULL) > > + return gomp_target_fallback (fn, hostaddrs); > > > > struct target_mem_desc *tgt_vars > > = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false, > > @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), > > size_t mapnum, > > } > > > > void *fn_addr = gomp_get_target_fn_addr (devicep, fn); > > + if (fn_addr == NULL) > > + return gomp_target_fallback (fn, hostaddrs); > > > > struct target_mem_desc *tgt_vars > > = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true, > > > > > >> other hand, for example, for -foffload=nvptx-none, even if user program > >> code doesn't contain any offloaded data (and thus the offload machinery > >> has not been run), the user program might still contain any executable > >> directives or OpenACC runtime library calls, so we'd still like to use > >> the libgomp nvptx plugin. However, we currently cannot detect this > >> situation. > >> > >> I see two ways to resolve this: a) embed the compile-time -foffload=[...] > >> configuration in the executable (as a string, for example) for libgomp to > >> look that up, or b) make it a requirement that (if configured via > >> -foffload=[...]), the offload machinery is run even if there is not > >> actually any data to be offloaded, so we then reliably get the respective > >> constructor call to libgomp's GOMP_offload_register. I once began to > >> implement a), but this to get a big ugly, so then looked into b) instead. > >> Compared to the status quo, always running the whole offloading machinery > >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp > >> are active, certainly does introduce some overhead when there isn't > >> actually any code to be offloaded, so I'm not sure whether that is > >> acceptable? > > > > I vote for (a). > > What happens for conflicting -fofffload=[...] options in different TUs?
If you're asking about what happens now, only the list of offload targets from link-time -foffload=tgt1,tgt2 option matters. I don't like plan (b) because it calls ipa_write_summaries unconditionally for all OpenMP programs, which creates IR sections, which increases filesize and may cause other problems, e.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868>. Also compile-time is increased because of LTO machinery, mkoffloads, etc. If OpenACC requires some registration in libgomp even without offload, maybe you can run this machinery only under flag_openacc? -- Ilya