[gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Alexander Monakov Tue, 20 Oct 2015 11:37:31 -0700

(This patch serves as a straw man proposal to have something concrete for
discussion and further patches)


On PTX, stack memory is private to each thread.  When master thread constructs
'omp_data_o' on its own stack and passes it to other threads via
GOMP_parallel by reference, other threads cannot use the resulting pointer.
We need to arrange structures passed between threads be in global, or better,
in PTX __shared__ memory (private to each CUDA thread block).

We cannot easily adjust expansion of 'omp parallel' because it is done before
LTO streamout.  I've opted to adjust calls to GOMP_parallel in
pass_late_lower_omp instead.

As I see, there are two possible approaches.  Either arrange the structure be
in shared memory from the compiler, or have GOMP_parallel perform the copies.
The latter requires passing sizeof(omp_data_o) to GOMP_parallel, and also to
GOMP_OFFLOAD_run (to reserve shared memory), so doing it from the compiler
seems simpler.

Using static storage may preclude nested parallelism.  Not sure we want to
support it for offloading anyway (but there needs to be a clear decision).

Using separate variables is wasteful: they should go into a union to reduce
shared memory consumption.

        * omp-low.c (expand_parallel_call): Mark function for
        pass_late_lower_omp transforms.
        (pass_late_lower_omp::execute): Copy omp_data_o to/from
        'shared' memory on NVPTX.
---
 gcc/omp-low.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 46 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6481163..5b75bf6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5384,7 +5384,10 @@ expand_parallel_call (struct omp_region *region, 
basic_block bb,
   if (t == NULL)
     t1 = null_pointer_node;
   else
-    t1 = build_fold_addr_expr (t);
+    {
+      t1 = build_fold_addr_expr (t);
+      cfun->curr_properties &= ~PROP_gimple_lompifn;
+    }
   t2 = build_fold_addr_expr (gimple_omp_parallel_child_fn (entry_stmt));
 
   vec_alloc (args, 4 + vec_safe_length (ws_args));
@@ -14703,15 +14706,51 @@ pass_late_lower_omp::execute (function *fun)
     for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next (&i))
       {
        gimple stmt = gsi_stmt (i);
-       if (!(is_gimple_call (stmt)
-             && gimple_call_internal_p (stmt)
-             && gimple_call_internal_fn (stmt) == IFN_GOACC_DATA_END_WITH_ARG))
+
+       if (!is_gimple_call (stmt))
          continue;
 
-       tree fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
-       gimple g = gimple_build_call (fn, 0);
+#ifdef ADDR_SPACE_SHARED
+       /* Transform "GOMP_parallel (fn, &omp_data_o, ...)" call to
+
+          static __shared__ typeof(omp_data_o) omp_data_shared;
+          omp_data_shared = omp_data_o;
+          GOMP_parallel(fn, &omp_data_shared, ...);
+          omp_data_o = omp_data_shared; */
+       if (gimple_call_builtin_p (stmt, BUILT_IN_GOMP_PARALLEL))
+         {
+           tree omp_data_ptr = gimple_call_arg (stmt, 1);
+           if (TREE_CODE (omp_data_ptr) == ADDR_EXPR)
+             {
+               tree omp_data = TREE_OPERAND (omp_data_ptr, 0);
+               tree type = TREE_TYPE (omp_data);
+               int quals = ENCODE_QUAL_ADDR_SPACE (ADDR_SPACE_SHARED);
+               type = build_qualified_type (type, quals);
+               tree decl = create_tmp_var (type, "omp_data_shared");
+               TREE_STATIC (decl) = 1;
+               TREE_ADDRESSABLE (decl) = 1;
+               varpool_node::finalize_decl (decl);
+
+               gimple g = gimple_build_assign (decl, omp_data);
+               gsi_insert_before (&i, g, GSI_SAME_STMT);
+
+               g = gimple_build_assign (omp_data, decl);
+               gsi_insert_after (&i, g, GSI_NEW_STMT);
+
+               gimple_call_set_arg (stmt, 1, build_fold_addr_expr (decl));
+             }
+           continue;
+         }
+#endif
+
+       if (gimple_call_internal_p (stmt)
+           && gimple_call_internal_fn (stmt) == IFN_GOACC_DATA_END_WITH_ARG)
+         {
+           tree fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+           gimple g = gimple_build_call (fn, 0);
 
-       gsi_replace (&i, g, false);
+           gsi_replace (&i, g, false);
+         }
       }
 
   return TODO_update_ssa;

[gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Reply via email to