On Mon, May 31, 2021 at 09:36:41PM +0800, Chung-Lin Tang wrote:
> this is a v3 version of my OpenMP 5.0 structure element mapping patch,
> v2 was here: 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html
> 
> This v3 adds a small bug fix, where the initialization of the refcount didn't
> handle all cases, fixed by using gomp_refcount_increment here (more 
> consistent).

Sorry for the delay.

Ok for trunk, but see some nits in the testsuite.

> I know you had performance concerns in the last round, compared with your 
> sorting
> approach. I'll try to research on that later. Getting the v3 patch posted 
> before
> backporting to devel/omp/gcc-11.

But please have a look at this incrementally.
I think the common case is just a couple of mappings (say < 10 or < 20 in
90%+ of cases) and a htab might be too expensive for that.

> 
>       libgomp/
>       * hashtab.h (htab_clear): New function with initialization code
>       factored out from...
>       (htab_create): ...here, adjust to use htab_clear function.
> 
>       * libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
>       special refcount values, add comments.
>       (REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
>       (REFCOUNT_LINK): Likewise.
>       (REFCOUNT_STRUCTELEM): New special refcount range for structure
>       element siblings.
>       (REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
>       sibling maps.
>       (REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
>       (REFCOUNT_STRUCTELEM_FLAG_LAST):  Flag to indicate last sibling.
>       (REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
>       (REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
>       (struct splay_tree_key_s): Add structelem_refcount and
>       structelem_refcount_ptr fields into a union with dynamic_refcount.
>       Add comments.
>       (gomp_map_vars): Delete declaration.
>       (gomp_map_vars_async): Likewise.
>       (gomp_unmap_vars): Likewise.
>       (gomp_unmap_vars_async): Likewise.
>       (goacc_map_vars): New declaration.
>       (goacc_unmap_vars): Likewise.
> 
>       * oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
>       (goacc_enter_datum): Likewise.
>       (goacc_enter_data_internal): Likewise.
>       * oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
>       and goacc_unmap_vars.
>       (GOACC_data_start): Adjust to use goacc_map_vars.
>       (GOACC_data_end): Adjust to use goacc_unmap_vars.
> 
>       * target.c (hash_entry_type): New typedef.
>       (htab_alloc): New function hook for hashtab.h.
>       (htab_free): Likewise.
>       (htab_hash): Likewise.
>       (htab_eq): Likewise.
>       (hashtab.h): Add file include.
>       (gomp_increment_refcount): New function.
>       (gomp_decrement_refcount): Likewise.
>       (gomp_map_vars_existing): Add refcount_set parameter, adjust to use
>       gomp_increment_refcount.
>       (gomp_map_fields_existing): Add refcount_set parameter, adjust calls
>       to gomp_map_vars_existing.
> 
>       (gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
>       variable to guard OpenMP specific paths, adjust calls to
>       gomp_map_vars_existing, add structure element sibling splay_tree_key
>       sequence creation code, adjust Fortran map case to avoid increment
>       under OpenMP.
>       (gomp_map_vars): Adjust to static, add refcount_set parameter, manage
>       local refcount_set if caller passed in NULL, adjust call to
>       gomp_map_vars_internal.
>       (gomp_map_vars_async): Adjust and rename into...
>       (goacc_map_vars): ...this new function, adjust call to
>       gomp_map_vars_internal.
> 
>       (gomp_remove_splay_tree_key): New function with code factored out from
>       gomp_remove_var_internal.
>       (gomp_remove_var_internal): Add code to handle removing multiple
>       splay_tree_key sequence for structure elements, adjust code to use
>       gomp_remove_splay_tree_key for splay-tree key removal.
>       (gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use
>       gomp_decrement_refcount.
>       (gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage
>       local refcount_set if caller passed in NULL, adjust call to
>       gomp_unmap_vars_internal.
>       (gomp_unmap_vars_async): Adjust and rename into...
>       (goacc_unmap_vars): ...this new function, adjust call to
>       gomp_unmap_vars_internal.
>       (GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and
>       gomp_unmap_vars.
>       (GOMP_target_ext): Likewise.
>       (gomp_target_data_fallback): Adjust call to gomp_map_vars.
>       (GOMP_target_data): Likewise.
>       (GOMP_target_data_ext): Likewise.
>       (GOMP_target_end_data): Adjust call to gomp_unmap_vars.
>       (gomp_exit_data): Add refcount_set parameter, adjust to use
>       gomp_decrement_refcount, adjust to queue splay-tree keys for removal
>       after main loop.
>       (GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to
>       gomp_map_vars and gomp_exit_data.
>       (gomp_target_task_fn): Likewise.
> 
>       * testsuite/libgomp.c-c++-common/refcount-1.c: New testcase.
>       * testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase.
>       * testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase.
>       * testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase.
>       * testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase.
>       * testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/refcount-1.c
> @@ -0,0 +1,52 @@
> +#include <omp.h>
> +#include <stdlib.h>
> +
> +int main (void)
> +{
> +  int d = omp_get_default_device ();
> +  int id = omp_get_initial_device ();
> +
> +  if (d < 0 || d >= omp_get_num_devices ())
> +    d = id;
> +
> +  unsigned int a = 0xcdcdcdcd;
> +  #pragma omp target enter data map (to:a)
> +
> +  a = 0xabababab;
> +  unsigned char *p = (unsigned char *) &a;
> +  unsigned char *q = p + 2;
> +
> +  #pragma omp target enter data map (alloc:p[:1], q[:1])
> +
> +  if (!omp_target_is_present (&a, d))
> +    abort ();
> +  if (!omp_target_is_present (&p[0], d))
> +    abort ();
> +  if (!omp_target_is_present (&q[0], d))
> +    abort ();
> +
> +  #pragma omp target exit data map (release:a)
> +
> +  if (!omp_target_is_present (&a, d))
> +    abort ();
> +  if (!omp_target_is_present (&p[0], d))
> +    abort ();
> +  if (!omp_target_is_present (&q[0], d))
> +    abort ();
> +
> +  #pragma omp target exit data map (from:q[:1])
> +
> +  if (omp_target_is_present (&a, d))
> +    abort ();

Has this been tested with offloading not configured?
omp_target_is_present will return 1 for the initial device
for all the pointers (everything is present).
So I wonder if these 3 if (omp_target_is_present (..., d))
shouldn't be
  if (d != id && omp_target_is_present (..., d))

> +  if (omp_target_is_present (&p[0], d))
> +    abort ();
> +  if (omp_target_is_present (&q[0], d))
> +    abort ();

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-1.c
> @@ -0,0 +1,29 @@
> +#include <omp.h>
> +#include <stdlib.h>
> +
> +struct S
> +{
> +  int a, b;
> +};
> +typedef struct S S;
> +
> +int main (void)
> +{
> +  int d = omp_get_default_device ();
> +  int id = omp_get_initial_device ();
> +
> +  if (d < 0 || d >= omp_get_num_devices ())
> +    d = id;
> +
> +  S s;
> +  #pragma omp target enter data map (alloc: s.a, s.b)
> +  #pragma omp target exit data map (release: s.b)
> +
> +  /* OpenMP 5.0 structure element mapping rules describe that elements of 
> same
> +     structure variable should allocate/deallocate in a uniform fashion, so
> +     "s.a" should be removed together by above 'exit data'.  */
> +  if (omp_target_is_present (&s.a, d))

Again.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-2.c
> @@ -0,0 +1,44 @@
> +#include <omp.h>
> +#include <stdlib.h>
> +
> +struct S
> +{
> +  int a, b, c, d;
> +};
> +typedef struct S S;
> +
> +int main (void)
> +{
> +  int d = omp_get_default_device ();
> +  int id = omp_get_initial_device ();
> +
> +  if (d < 0 || d >= omp_get_num_devices ())
> +    d = id;
> +
> +  S s;
> +  #pragma omp target enter data map (alloc: s.a, s.b, s.c, s.d)
> +  #pragma omp target enter data map (alloc: s.c)
> +  #pragma omp target enter data map (alloc: s.b, s.d)
> +  #pragma omp target enter data map (alloc: s.a, s.c, s.b)
> +
> +  #pragma omp target exit data map (release: s.a)
> +  #pragma omp target exit data map (release: s.d)
> +  #pragma omp target exit data map (release: s.c)
> +  #pragma omp target exit data map (release: s.b)
> +
> +  /* OpenMP 5.0 structure element mapping rules describe that elements of 
> same
> +     structure variable should allocate/deallocate in a uniform fashion, so
> +     all elements of 's' should be removed together by above 'exit data's.  
> */
> +  if (omp_target_is_present (&s, d))
> +    abort ();
> +  if (omp_target_is_present (&s.a, d))
> +    abort ();
> +  if (omp_target_is_present (&s.b, d))
> +    abort ();
> +  if (omp_target_is_present (&s.c, d))
> +    abort ();
> +  if (omp_target_is_present (&s.d, d))
> +    abort ();

And again.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-3.c
> @@ -0,0 +1,63 @@
> +#include <omp.h>
> +#include <stdlib.h>
> +
> +struct S
> +{
> +  int a, b, c, d;
> +};
> +typedef struct S S;
> +
> +int main (void)
> +{
> +  int d = omp_get_default_device ();
> +  int id = omp_get_initial_device ();
> +
> +  if (d < 0 || d >= omp_get_num_devices ())
> +    d = id;
> +
> +  S s;
> +
> +  #pragma omp target enter data map (alloc: s)
> +  #pragma omp target enter data map (alloc: s)
> +
> +  #pragma omp target exit data map (release: s.a)
> +  #pragma omp target exit data map (release: s.b)
> +
> +  /* OpenMP 5.0 structure element mapping rules describe that elements of 
> same
> +     structure variable should allocate/deallocate in a uniform fashion, so
> +     all elements of 's' should be removed together by above 'exit data's.  
> */
> +  if (omp_target_is_present (&s, d))
> +    abort ();
> +  if (omp_target_is_present (&s.a, d))
> +    abort ();
> +  if (omp_target_is_present (&s.b, d))
> +    abort ();
> +  if (omp_target_is_present (&s.c, d))
> +    abort ();
> +  if (omp_target_is_present (&s.d, d))
> +    abort ();
> +
> +  #pragma omp target enter data map (alloc: s.a, s.b)
> +  #pragma omp target enter data map (alloc: s.a)
> +  #pragma omp target enter data map (alloc: s.b)
> +
> +  #pragma omp target exit data map (release: s)
> +  #pragma omp target exit data map (release: s)
> +  #pragma omp target exit data map (release: s)
> +
> +  /* OpenMP 5.0 structure element mapping rules describe that elements of 
> same
> +     structure variable should allocate/deallocate in a uniform fashion, so
> +     all elements of 's' should be removed together by above 'exit data's.  
> */
> +  if (omp_target_is_present (&s, d))
> +    abort ();
> +  if (omp_target_is_present (&s.a, d))
> +    abort ();
> +  if (omp_target_is_present (&s.b, d))
> +    abort ();
> +  if (omp_target_is_present (&s.c, d))
> +    abort ();
> +  if (omp_target_is_present (&s.d, d))
> +    abort ();

And again.

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/struct-elem-4.c
> @@ -0,0 +1,50 @@
> +#include <omp.h>
> +#include <stdlib.h>
> +
> +struct S
> +{
> +  int a, b, c, d, e;
> +};
> +typedef struct S S;
> +
> +int main (void)
> +{
> +  int d = omp_get_default_device ();
> +  int id = omp_get_initial_device ();
> +
> +  if (d < 0 || d >= omp_get_num_devices ())
> +    d = id;
> +
> +  S s = { 1, 2, 3, 4, 5 };
> +  #pragma omp target enter data map (to:s)
> +
> +  int *p = &s.b;
> +  int *q = &s.d;
> +  #pragma omp target enter data map (alloc: p[:1], q[:1])
> +
> +  s.b = 88;
> +  s.d = 99;
> +
> +  #pragma omp target exit data map (release: s)
> +  if (!omp_target_is_present (&s, d))
> +    abort ();
> +  if (!omp_target_is_present (&p[0], d))
> +    abort ();
> +  if (!omp_target_is_present (&q[0], d))
> +    abort ();
> +
> +  #pragma omp target exit data map (from: q[:1])
> +  if (omp_target_is_present (&s, d))
> +    abort ();
> +  if (omp_target_is_present (&p[0], d))
> +    abort ();
> +  if (omp_target_is_present (&q[0], d))
> +    abort ();

And again.

        Jakub

Reply via email to