[PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface

Johannes Doerfert via Phabricator via cfe-commits Thu, 14 Mar 2019 12:50:16 -0700

jdoerfert added inline comments.


================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:27
+
+/// The target region _kernel_ interface for GPUs
+///
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > All exported functions are declared in the `interface.h` file. I don't 
> > > think we need an extra interface file here
> > `interface.h`, or to be more precise for people that do not know, 
> > `deviceRTLs/nvptx/src/interface.h`, is nvptx specific. This file, 
> > `deviceRTLs/common/target_region.h`, is by design target agnostic and not 
> > placed _under_ the nvptx subfolder. If you are willing to move 
> > `interface.h` into a common space and remove the nvptx specific functions 
> > we can merge the two. Otherwise, I have strong reservations agains that and 
> > good reason not to do it.
> I see that currently it is written in Cuda. It means, it targets NVidia GPUs, 
> at least at the moment. I'm fine to put this header file into the common 
> directory, if you're sure that this is really target agnostic. But maybe just 
> for a start we should put it to NVPTX directory? Later, when you or somebody 
> else will add support for other GPUs and he/she will find out that these 
> functions are really target agnostic, they can be moved into the common 
> directory?
> I see that currently it is written in Cuda. It means, it targets NVidia GPUs, 
> at least at the moment

How do you see that? (I hope we both talk about this file, correct?)


> But maybe just for a start we should put it to NVPTX directory?

Why? What is the benefit? If we want it to be agnostic, regardless of the 
current state, it should be developed _outside_ of the target specific 
directories.



================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100
+///
+EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode,
+                                               bool RequiresOMPRuntime,
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > Better to use `ident_loc` for passing info about execution mode and 
> > > full/lightweight runtime.
> > Could you please explain why you think that? Adding indirection through a 
> > structure does not really seem beneficial to me.
> Almost all function from libomp rely on `ident_loc`. The functions, which 
> were added for NVPTX without this parameter had a lot of problems later and 
> most of them were replaced with the functions with this parameter type. Plus, 
> this parameter is used for OMPD/OMPT and it may be important for future 
> OMPD/OMPT support.
> Almost all function from libomp rely on ident_loc.

If you look at the implementation of this interface for NVPTX you will see that 
the called functions do not take `ident_loc` values. When you create the calls 
from the existing NVPTX code generation in clang, the current code **does not 
use** `ident_loc` for similar functions, see:
`___kmpc_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime)`,
`__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`,
`__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime, 
int16_t RequiresDataSharing)`,
`__kmpc_kernel_parallel(void **outlined_function, int16_t 
IsOMPRuntimeInitialized)`,
...



> Plus, this parameter is used for OMPD/OMPT and it may be important for future 
> OMPD/OMPT support.

If we at some point need to make the options permanent in an `ident_loc` we can 
simply pass an `ident_loc` and require it to be initialized by the call. 
Cluttering the user code with stores and indirection is exactly what I do want 
to avoid.


================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:124
+/// unpacking code.
+typedef void (*ParallelWorkFnTy)(char * /* SharedValues */,
+                                 char * /* PrivateValues */);
----------------
ABataev wrote:
> We used `void *` for buffers usually, I think it is better to use `void *` 
> here too instead of `char *`.
Thanks, fixed.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+////////////////////////////////////////////////////////////////////////////////
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
+
----------------
ABataev wrote:
> It would be good to store it the global memory rather than in the shared to 
> save th shared memory. Also, we already are using several shared memory 
> buffers for different purposes, it would be good to merge them somehow to 
> reduce pressure on shared memory.
I would have reused your buffer but it is for reasons unclear to me, not a 
byte-wise buffer but an array of `void *` and also used as such. Using it as a 
byte-wise buffer might cause problems or at least confusion. Changing it to a 
byte-wise buffer would be fine with me. I don't need a separate buffer but just 
one with the functionality implemented in this one.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:64
+
+/// Filter threads into masters and workers. If \p UseStateMachine is true,
+/// required workers will enter a state machine through and be trapped there.
----------------
ABataev wrote:
> What is the criteria for `UseStateMachine`? Under what conditions it can be 
> set to `true` and `false`? Also, what if have several parallel regions in 
> non-SPMD kernel and `UseStateMachine` is `true`?
> What is the criteria for UseStateMachine? Under what conditions it can be set 
> to true and false? 

`UseStateMachine` is an option exposed to the outer world through the 
`__kmpc_target_region_kernel_init` call. The semantics are explained here and 
in the declaration of `__kmpc_target_region_kernel_init`.

> Also, what if have several parallel regions in non-SPMD kernel and 
> UseStateMachine is true?

I don't see the problem, I expect all kernels having threads in their own state 
machine and no interference between them. That is at least what should happen. 
Maybe I miss something. Do you see a problem?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59319/new/

https://reviews.llvm.org/D59319



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D59319: [OpenMP][Offloading][1/3] A generic and simple target region interface

Reply via email to