ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:27
+
+/// The target region _kernel_ interface for GPUs
+///
----------------
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > All exported functions are declared in the `interface.h` file. I don't 
> > > > think we need an extra interface file here
> > > `interface.h`, or to be more precise for people that do not know, 
> > > `deviceRTLs/nvptx/src/interface.h`, is nvptx specific. This file, 
> > > `deviceRTLs/common/target_region.h`, is by design target agnostic and not 
> > > placed _under_ the nvptx subfolder. If you are willing to move 
> > > `interface.h` into a common space and remove the nvptx specific functions 
> > > we can merge the two. Otherwise, I have strong reservations agains that 
> > > and good reason not to do it.
> > I see that currently it is written in Cuda. It means, it targets NVidia 
> > GPUs, at least at the moment. I'm fine to put this header file into the 
> > common directory, if you're sure that this is really target agnostic. But 
> > maybe just for a start we should put it to NVPTX directory? Later, when you 
> > or somebody else will add support for other GPUs and he/she will find out 
> > that these functions are really target agnostic, they can be moved into the 
> > common directory?
> > I see that currently it is written in Cuda. It means, it targets NVidia 
> > GPUs, at least at the moment
> 
> How do you see that? (I hope we both talk about this file, correct?)
> 
> 
> > But maybe just for a start we should put it to NVPTX directory?
> 
> Why? What is the benefit? If we want it to be agnostic, regardless of the 
> current state, it should be developed _outside_ of the target specific 
> directories.
> 
I'm not talking about this particular file, just like I said we can put it into 
`common` subdirectory. I'm talking about the implementation files. They all are 
written in Cuda, no?
But it is not proved yet that this solution is target agnostic. Did you test it 
for AMD?


================
Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100
+///
+EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode,
+                                               bool RequiresOMPRuntime,
----------------
jdoerfert wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > Better to use `ident_loc` for passing info about execution mode and 
> > > > full/lightweight runtime.
> > > Could you please explain why you think that? Adding indirection through a 
> > > structure does not really seem beneficial to me.
> > Almost all function from libomp rely on `ident_loc`. The functions, which 
> > were added for NVPTX without this parameter had a lot of problems later and 
> > most of them were replaced with the functions with this parameter type. 
> > Plus, this parameter is used for OMPD/OMPT and it may be important for 
> > future OMPD/OMPT support.
> > Almost all function from libomp rely on ident_loc.
> 
> If you look at the implementation of this interface for NVPTX you will see 
> that the called functions do not take `ident_loc` values. When you create the 
> calls from the existing NVPTX code generation in clang, the current code 
> **does not use** `ident_loc` for similar functions, see:
> `___kmpc_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime)`,
> `__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`,
> `__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime, 
> int16_t RequiresDataSharing)`,
> `__kmpc_kernel_parallel(void **outlined_function, int16_t 
> IsOMPRuntimeInitialized)`,
> ...
> 
> 
> 
> > Plus, this parameter is used for OMPD/OMPT and it may be important for 
> > future OMPD/OMPT support.
> 
> If we at some point need to make the options permanent in an `ident_loc` we 
> can simply pass an `ident_loc` and require it to be initialized by the call. 
> Cluttering the user code with stores and indirection is exactly what I do 
> want to avoid.
1. The new functions rely on `ident_loc`. We had to add those new functions 
because the old ones did not use it and it was bad design decision. Now we need 
to fix this. I suggest you do everything right from the very beginning rather 
than fixing this later by adding extra entry points to support OMPT/OMPD or 
something else, for example.
2. No, you cannot simply change the interface of the library to keep the 
compatibility with the previous versions of the compiler/library. You will need 
to add the new entries.  


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70
+////////////////////////////////////////////////////////////////////////////////
+__device__ __shared__ target_region_shared_buffer _target_region_shared_memory;
+
----------------
jdoerfert wrote:
> ABataev wrote:
> > It would be good to store it the global memory rather than in the shared to 
> > save th shared memory. Also, we already are using several shared memory 
> > buffers for different purposes, it would be good to merge them somehow to 
> > reduce pressure on shared memory.
> I would have reused your buffer but it is for reasons unclear to me, not a 
> byte-wise buffer but an array of `void *` and also used as such. Using it as 
> a byte-wise buffer might cause problems or at least confusion. Changing it to 
> a byte-wise buffer would be fine with me. I don't need a separate buffer but 
> just one with the functionality implemented in this one.
I don't know what `my` buffer are talking about. I'm just saying that we 
already using a lot of shared memory and adding another one shared memory 
buffer of ~150 bytes per team increases pressure on the shared memory. It would 
be good to reuse the existing buffers somehow. It was just a suggestion.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:64
+
+/// Filter threads into masters and workers. If \p UseStateMachine is true,
+/// required workers will enter a state machine through and be trapped there.
----------------
jdoerfert wrote:
> ABataev wrote:
> > What is the criteria for `UseStateMachine`? Under what conditions it can be 
> > set to `true` and `false`? Also, what if have several parallel regions in 
> > non-SPMD kernel and `UseStateMachine` is `true`?
> > What is the criteria for UseStateMachine? Under what conditions it can be 
> > set to true and false? 
> 
> `UseStateMachine` is an option exposed to the outer world through the 
> `__kmpc_target_region_kernel_init` call. The semantics are explained here and 
> in the declaration of `__kmpc_target_region_kernel_init`.
> 
> > Also, what if have several parallel regions in non-SPMD kernel and 
> > UseStateMachine is true?
> 
> I don't see the problem, I expect all kernels having threads in their own 
> state machine and no interference between them. That is at least what should 
> happen. Maybe I miss something. Do you see a problem?
> 
1. I see its semantics, I'm asking when it must be set to `true` and when to 
`false`. Maybe I missed something, but currently, it is always set to `true` in 
the compiler patch. Do you really need it?
2. What if you have a single kernel with several consecutive parallel regions? 
Can you handle this?


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:166
+      // Copy the shared and private variables into shared memory.
+      char *SVMemory = __kmpc_target_region_kernel_get_shared_memory();
+      char *PVMemory = __kmpc_target_region_kernel_get_private_memory();
----------------
Use `void *` also, better to keep the same coding style across the whole library


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59319/new/

https://reviews.llvm.org/D59319



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to