https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116535

--- Comment #1 from Tobias Burnus <burnus at gcc dot gnu.org> ---
I have problems reproducing it fully reliably – and my impression is that a
global variable is not atomically set.

The difference between -flto=1 and -flto=2 with -flto-partition=max is rather
small. In either case, 36 partitions (LTRANS jobs) exist - and the partition
writing is the same.

* * *

If I add an
  #undef HAVE_WORKING_FORK
to gcc/lto/lto.cc, it seems to work - while without, I get the error for:

~/projects/gcc-trunk-offload/bin/gcc -save-temps -fopenacc -flto=2
-foffload=nvptx-none -flto-partition=max 
*/*/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c

libgomp: Cannot map target functions or variables (expected 20 + 0 + 1, have
11)

* * *

With -flto=<N> (with some reasonable N), stream_out_partitions is called <N>
times, i.e. for the example above with N=3:
  stream_out_partitions_1 ("./a.ltrans35.o", 10, 0, 12)
  stream_out_partitions_1 ("./a.ltrans11.o", 10, 12, 24)
  stream_out_partitions_1 ("./a.ltrans23.o", 10, 24, 36)
while for -flto=1, it's
  stream_out_partitions_1 ("./a.ltrans35.o", 10, 0, 36)

That's the only difference.

* * *

For -flto=3 + looking for '.offload_var_table' in the
'a.ltrans<i>.ltrans.{s,o}' files, shows that is always in the first file for
each set of files writt lto_parallelism  i.e., for -flto=3, it is in file
'a.ltrans<i>.ltrans.{s,o}' for <i> = 0, 12, and 24, where the files <i> = 0,
..., 35 exist.

* * *




* * *

However, with
#undef 




Recall how the data is created:
--------------------------------------------------

The expected number is:
  void **host_func_table = ((void ***) host_table)[0];
  void **host_funcs_end  = ((void ***) host_table)[1];
...
  int num_funcs = host_funcs_end - host_func_table;
  int num_vars  = (host_vars_end - host_var_table) / 2;

where host_table = __OFFLOAD_TABLE__ and libgcc/offloadstuff.c has:

#elif defined CRT_TABLE

const void *const __OFFLOAD_TABLE__[]
  __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_funcs_end,
  &__offload_var_table, &__offload_vars_end,
  &__offload_ind_func_table, &__offload_ind_funcs_end,
};


Where the entries are formed by:

#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"
#define OFFLOAD_IND_FUNC_TABLE_SECTION_NAME ".gnu.offload_ind_funcs"
#define OFFLOAD_VAR_TABLE_SECTION_NAME ".gnu.offload_vars"

#ifdef CRT_BEGIN

const void *const __offload_func_table[0]
  __attribute__ ((__used__, visibility ("hidden"),
                  section (OFFLOAD_FUNC_TABLE_SECTION_NAME))) = { };
...

#elif defined CRT_END
const void *const __offload_funcs_end[0]
  __attribute__ ((__used__, visibility ("hidden"),
                  section (OFFLOAD_FUNC_TABLE_SECTION_NAME))) = { };

And during link time, from which the files
  crtoffload{begin,end,table}$(objext)
are produced and GNU_USER_TARGET_STARTFILE_SPEC / GNU_USER_TARGET_ENDFILE_SPEC
ensures that __offload_func_table comes first and __offload_funcs_end last in
the list.

Reply via email to