Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Alexander Monakov Tue, 20 Oct 2015 23:08:36 -0700


On Wed, 21 Oct 2015, Bernd Schmidt wrote:


> On 10/20/2015 08:34 PM, Alexander Monakov wrote:
> > (This patch serves as a straw man proposal to have something concrete for
> > discussion and further patches)
> >
> > On PTX, stack memory is private to each thread.  When master thread
> > constructs
> > 'omp_data_o' on its own stack and passes it to other threads via
> > GOMP_parallel by reference, other threads cannot use the resulting pointer.
> > We need to arrange structures passed between threads be in global, or
> > better,
> > in PTX __shared__ memory (private to each CUDA thread block).
> 
> I guess the question is - why is it better? Do you have multiple thread blocks
> active in your execution model, 

'#pragma omp teams' should map to spawning multiple thread blocks, so yes, at
least in plans I do (but honestly I don't see how it affects the
heap-vs-shared memory decision here)

> and do they require different omp_data_o structures?

yes, each omp_data_o should be private to a team

> Are accesses to it performance critical (more so than any other access?)

Not sure how to address the "more so than ..." part, but since omp_data_o is
accessed by all threads after entering a parallel region, potentially many
times throughout the region, it does seem helpful to arrange it in shared
memory.

I expect there will be other instances like this one, where some on-stack data
will need to be moved to team-shared storage for nvptx.

> It might be slightly cleaner to have the copy described as a new builtin
> call that is always generated and expanded to nothing on normal targets
> rather than modifying existing calls in the IL. Or maybe:
> 
>  p = __builtin_omp_select_location (&stack_local_var, size) ....
>  __builtin_omp_maybe_free (p);
> 
> where the select_location could get simplified to a malloc for nvptx,
> hopefully making the stack variable unused and discarded.

Agreed.

> > Using separate variables is wasteful: they should go into a union to
> > reduce shared memory consumption.
> 
> Not sure what you mean by separate variables?

If two parallel regions are nested in a target region, there will be two
omp_data_o variables of potentially different types, but they can reuse the
same storage.  The patch does not achieve that, because it simply emits a
static __shared__ declaration for each original variable.

Alexander

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Reply via email to