On Mon, May 31, 2021 at 09:36:41PM +0800, Chung-Lin Tang wrote: > this is a v3 version of my OpenMP 5.0 structure element mapping patch, > v2 was here: > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html > > This v3 adds a small bug fix, where the initialization of the refcount didn't > handle all cases, fixed by using gomp_refcount_increment here (more > consistent).
Sorry for the delay. Ok for trunk, but see some nits in the testsuite. > I know you had performance concerns in the last round, compared with your > sorting > approach. I'll try to research on that later. Getting the v3 patch posted > before > backporting to devel/omp/gcc-11. But please have a look at this incrementally. I think the common case is just a couple of mappings (say < 10 or < 20 in 90%+ of cases) and a htab might be too expensive for that. > > libgomp/ > * hashtab.h (htab_clear): New function with initialization code > factored out from... > (htab_create): ...here, adjust to use htab_clear function. > > * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of > special refcount values, add comments. > (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL. > (REFCOUNT_LINK): Likewise. > (REFCOUNT_STRUCTELEM): New special refcount range for structure > element siblings. > (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element > sibling maps. > (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling. > (REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling. > (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag. > (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag. > (struct splay_tree_key_s): Add structelem_refcount and > structelem_refcount_ptr fields into a union with dynamic_refcount. > Add comments. > (gomp_map_vars): Delete declaration. > (gomp_map_vars_async): Likewise. > (gomp_unmap_vars): Likewise. > (gomp_unmap_vars_async): Likewise. > (goacc_map_vars): New declaration. > (goacc_unmap_vars): Likewise. > > * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars. > (goacc_enter_datum): Likewise. > (goacc_enter_data_internal): Likewise. > * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars > and goacc_unmap_vars. > (GOACC_data_start): Adjust to use goacc_map_vars. > (GOACC_data_end): Adjust to use goacc_unmap_vars. > > * target.c (hash_entry_type): New typedef. > (htab_alloc): New function hook for hashtab.h. > (htab_free): Likewise. > (htab_hash): Likewise. > (htab_eq): Likewise. > (hashtab.h): Add file include. > (gomp_increment_refcount): New function. > (gomp_decrement_refcount): Likewise. > (gomp_map_vars_existing): Add refcount_set parameter, adjust to use > gomp_increment_refcount. > (gomp_map_fields_existing): Add refcount_set parameter, adjust calls > to gomp_map_vars_existing. > > (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p > variable to guard OpenMP specific paths, adjust calls to > gomp_map_vars_existing, add structure element sibling splay_tree_key > sequence creation code, adjust Fortran map case to avoid increment > under OpenMP. > (gomp_map_vars): Adjust to static, add refcount_set parameter, manage > local refcount_set if caller passed in NULL, adjust call to > gomp_map_vars_internal. > (gomp_map_vars_async): Adjust and rename into... > (goacc_map_vars): ...this new function, adjust call to > gomp_map_vars_internal. > > (gomp_remove_splay_tree_key): New function with code factored out from > gomp_remove_var_internal. > (gomp_remove_var_internal): Add code to handle removing multiple > splay_tree_key sequence for structure elements, adjust code to use > gomp_remove_splay_tree_key for splay-tree key removal. > (gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use > gomp_decrement_refcount. > (gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage > local refcount_set if caller passed in NULL, adjust call to > gomp_unmap_vars_internal. > (gomp_unmap_vars_async): Adjust and rename into... > (goacc_unmap_vars): ...this new function, adjust call to > gomp_unmap_vars_internal. > (GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and > gomp_unmap_vars. > (GOMP_target_ext): Likewise. > (gomp_target_data_fallback): Adjust call to gomp_map_vars. > (GOMP_target_data): Likewise. > (GOMP_target_data_ext): Likewise. > (GOMP_target_end_data): Adjust call to gomp_unmap_vars. > (gomp_exit_data): Add refcount_set parameter, adjust to use > gomp_decrement_refcount, adjust to queue splay-tree keys for removal > after main loop. > (GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to > gomp_map_vars and gomp_exit_data. > (gomp_target_task_fn): Likewise. > > * testsuite/libgomp.c-c++-common/refcount-1.c: New testcase. > * testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase. > * testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase. > * testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase. > * testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase. > * testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/refcount-1.c > @@ -0,0 +1,52 @@ > +#include <omp.h> > +#include <stdlib.h> > + > +int main (void) > +{ > + int d = omp_get_default_device (); > + int id = omp_get_initial_device (); > + > + if (d < 0 || d >= omp_get_num_devices ()) > + d = id; > + > + unsigned int a = 0xcdcdcdcd; > + #pragma omp target enter data map (to:a) > + > + a = 0xabababab; > + unsigned char *p = (unsigned char *) &a; > + unsigned char *q = p + 2; > + > + #pragma omp target enter data map (alloc:p[:1], q[:1]) > + > + if (!omp_target_is_present (&a, d)) > + abort (); > + if (!omp_target_is_present (&p[0], d)) > + abort (); > + if (!omp_target_is_present (&q[0], d)) > + abort (); > + > + #pragma omp target exit data map (release:a) > + > + if (!omp_target_is_present (&a, d)) > + abort (); > + if (!omp_target_is_present (&p[0], d)) > + abort (); > + if (!omp_target_is_present (&q[0], d)) > + abort (); > + > + #pragma omp target exit data map (from:q[:1]) > + > + if (omp_target_is_present (&a, d)) > + abort (); Has this been tested with offloading not configured? omp_target_is_present will return 1 for the initial device for all the pointers (everything is present). So I wonder if these 3 if (omp_target_is_present (..., d)) shouldn't be if (d != id && omp_target_is_present (..., d)) > + if (omp_target_is_present (&p[0], d)) > + abort (); > + if (omp_target_is_present (&q[0], d)) > + abort (); > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-1.c > @@ -0,0 +1,29 @@ > +#include <omp.h> > +#include <stdlib.h> > + > +struct S > +{ > + int a, b; > +}; > +typedef struct S S; > + > +int main (void) > +{ > + int d = omp_get_default_device (); > + int id = omp_get_initial_device (); > + > + if (d < 0 || d >= omp_get_num_devices ()) > + d = id; > + > + S s; > + #pragma omp target enter data map (alloc: s.a, s.b) > + #pragma omp target exit data map (release: s.b) > + > + /* OpenMP 5.0 structure element mapping rules describe that elements of > same > + structure variable should allocate/deallocate in a uniform fashion, so > + "s.a" should be removed together by above 'exit data'. */ > + if (omp_target_is_present (&s.a, d)) Again. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-2.c > @@ -0,0 +1,44 @@ > +#include <omp.h> > +#include <stdlib.h> > + > +struct S > +{ > + int a, b, c, d; > +}; > +typedef struct S S; > + > +int main (void) > +{ > + int d = omp_get_default_device (); > + int id = omp_get_initial_device (); > + > + if (d < 0 || d >= omp_get_num_devices ()) > + d = id; > + > + S s; > + #pragma omp target enter data map (alloc: s.a, s.b, s.c, s.d) > + #pragma omp target enter data map (alloc: s.c) > + #pragma omp target enter data map (alloc: s.b, s.d) > + #pragma omp target enter data map (alloc: s.a, s.c, s.b) > + > + #pragma omp target exit data map (release: s.a) > + #pragma omp target exit data map (release: s.d) > + #pragma omp target exit data map (release: s.c) > + #pragma omp target exit data map (release: s.b) > + > + /* OpenMP 5.0 structure element mapping rules describe that elements of > same > + structure variable should allocate/deallocate in a uniform fashion, so > + all elements of 's' should be removed together by above 'exit data's. > */ > + if (omp_target_is_present (&s, d)) > + abort (); > + if (omp_target_is_present (&s.a, d)) > + abort (); > + if (omp_target_is_present (&s.b, d)) > + abort (); > + if (omp_target_is_present (&s.c, d)) > + abort (); > + if (omp_target_is_present (&s.d, d)) > + abort (); And again. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-3.c > @@ -0,0 +1,63 @@ > +#include <omp.h> > +#include <stdlib.h> > + > +struct S > +{ > + int a, b, c, d; > +}; > +typedef struct S S; > + > +int main (void) > +{ > + int d = omp_get_default_device (); > + int id = omp_get_initial_device (); > + > + if (d < 0 || d >= omp_get_num_devices ()) > + d = id; > + > + S s; > + > + #pragma omp target enter data map (alloc: s) > + #pragma omp target enter data map (alloc: s) > + > + #pragma omp target exit data map (release: s.a) > + #pragma omp target exit data map (release: s.b) > + > + /* OpenMP 5.0 structure element mapping rules describe that elements of > same > + structure variable should allocate/deallocate in a uniform fashion, so > + all elements of 's' should be removed together by above 'exit data's. > */ > + if (omp_target_is_present (&s, d)) > + abort (); > + if (omp_target_is_present (&s.a, d)) > + abort (); > + if (omp_target_is_present (&s.b, d)) > + abort (); > + if (omp_target_is_present (&s.c, d)) > + abort (); > + if (omp_target_is_present (&s.d, d)) > + abort (); > + > + #pragma omp target enter data map (alloc: s.a, s.b) > + #pragma omp target enter data map (alloc: s.a) > + #pragma omp target enter data map (alloc: s.b) > + > + #pragma omp target exit data map (release: s) > + #pragma omp target exit data map (release: s) > + #pragma omp target exit data map (release: s) > + > + /* OpenMP 5.0 structure element mapping rules describe that elements of > same > + structure variable should allocate/deallocate in a uniform fashion, so > + all elements of 's' should be removed together by above 'exit data's. > */ > + if (omp_target_is_present (&s, d)) > + abort (); > + if (omp_target_is_present (&s.a, d)) > + abort (); > + if (omp_target_is_present (&s.b, d)) > + abort (); > + if (omp_target_is_present (&s.c, d)) > + abort (); > + if (omp_target_is_present (&s.d, d)) > + abort (); And again. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-4.c > @@ -0,0 +1,50 @@ > +#include <omp.h> > +#include <stdlib.h> > + > +struct S > +{ > + int a, b, c, d, e; > +}; > +typedef struct S S; > + > +int main (void) > +{ > + int d = omp_get_default_device (); > + int id = omp_get_initial_device (); > + > + if (d < 0 || d >= omp_get_num_devices ()) > + d = id; > + > + S s = { 1, 2, 3, 4, 5 }; > + #pragma omp target enter data map (to:s) > + > + int *p = &s.b; > + int *q = &s.d; > + #pragma omp target enter data map (alloc: p[:1], q[:1]) > + > + s.b = 88; > + s.d = 99; > + > + #pragma omp target exit data map (release: s) > + if (!omp_target_is_present (&s, d)) > + abort (); > + if (!omp_target_is_present (&p[0], d)) > + abort (); > + if (!omp_target_is_present (&q[0], d)) > + abort (); > + > + #pragma omp target exit data map (from: q[:1]) > + if (omp_target_is_present (&s, d)) > + abort (); > + if (omp_target_is_present (&p[0], d)) > + abort (); > + if (omp_target_is_present (&q[0], d)) > + abort (); And again. Jakub