Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Bernd Schmidt Tue, 20 Oct 2015 17:06:08 -0700

On 10/20/2015 08:34 PM, Alexander Monakov wrote:

(This patch serves as a straw man proposal to have something concrete for
discussion and further patches)


On PTX, stack memory is private to each thread.  When master thread constructs
'omp_data_o' on its own stack and passes it to other threads via
GOMP_parallel by reference, other threads cannot use the resulting pointer.
We need to arrange structures passed between threads be in global, or better,
in PTX __shared__ memory (private to each CUDA thread block).

I guess the question is - why is it better? Do you have multiple threadblocks active in your execution model, and do they require differentomp_data_o structures? Are accesses to it performance critical (more sothan any other access?) If the answers are "no", then I think youprobably want to fall back to just normal malloced memory or a regular

static variable, as shared memory is a fairly limited resource.

It might be slightly cleaner to have the copy described as a new builtincall that is always generated and expanded to nothing on normal targetsrather than modifying existing calls in the IL. Or maybe:


 p = __builtin_omp_select_location (&stack_local_var, size)
 ....
 __builtin_omp_maybe_free (p);

where the select_location could get simplified to a malloc for nvptx,hopefully making the stack variable unused and discarded.

Using separate variables is wasteful: they should go into a union to reduce
shared memory consumption.


Not sure what you mean by separate variables?


Bernd

Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

Reply via email to