ABataev added inline comments.
================ Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:27 + +/// The target region _kernel_ interface for GPUs +/// ---------------- jdoerfert wrote: > ABataev wrote: > > jdoerfert wrote: > > > ABataev wrote: > > > > All exported functions are declared in the `interface.h` file. I don't > > > > think we need an extra interface file here > > > `interface.h`, or to be more precise for people that do not know, > > > `deviceRTLs/nvptx/src/interface.h`, is nvptx specific. This file, > > > `deviceRTLs/common/target_region.h`, is by design target agnostic and not > > > placed _under_ the nvptx subfolder. If you are willing to move > > > `interface.h` into a common space and remove the nvptx specific functions > > > we can merge the two. Otherwise, I have strong reservations agains that > > > and good reason not to do it. > > I see that currently it is written in Cuda. It means, it targets NVidia > > GPUs, at least at the moment. I'm fine to put this header file into the > > common directory, if you're sure that this is really target agnostic. But > > maybe just for a start we should put it to NVPTX directory? Later, when you > > or somebody else will add support for other GPUs and he/she will find out > > that these functions are really target agnostic, they can be moved into the > > common directory? > > I see that currently it is written in Cuda. It means, it targets NVidia > > GPUs, at least at the moment > > How do you see that? (I hope we both talk about this file, correct?) > > > > But maybe just for a start we should put it to NVPTX directory? > > Why? What is the benefit? If we want it to be agnostic, regardless of the > current state, it should be developed _outside_ of the target specific > directories. > I'm not talking about this particular file, just like I said we can put it into `common` subdirectory. I'm talking about the implementation files. They all are written in Cuda, no? But it is not proved yet that this solution is target agnostic. Did you test it for AMD? ================ Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100 +/// +EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode, + bool RequiresOMPRuntime, ---------------- jdoerfert wrote: > ABataev wrote: > > jdoerfert wrote: > > > ABataev wrote: > > > > Better to use `ident_loc` for passing info about execution mode and > > > > full/lightweight runtime. > > > Could you please explain why you think that? Adding indirection through a > > > structure does not really seem beneficial to me. > > Almost all function from libomp rely on `ident_loc`. The functions, which > > were added for NVPTX without this parameter had a lot of problems later and > > most of them were replaced with the functions with this parameter type. > > Plus, this parameter is used for OMPD/OMPT and it may be important for > > future OMPD/OMPT support. > > Almost all function from libomp rely on ident_loc. > > If you look at the implementation of this interface for NVPTX you will see > that the called functions do not take `ident_loc` values. When you create the > calls from the existing NVPTX code generation in clang, the current code > **does not use** `ident_loc` for similar functions, see: > `___kmpc_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime)`, > `__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`, > `__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t RequiresOMPRuntime, > int16_t RequiresDataSharing)`, > `__kmpc_kernel_parallel(void **outlined_function, int16_t > IsOMPRuntimeInitialized)`, > ... > > > > > Plus, this parameter is used for OMPD/OMPT and it may be important for > > future OMPD/OMPT support. > > If we at some point need to make the options permanent in an `ident_loc` we > can simply pass an `ident_loc` and require it to be initialized by the call. > Cluttering the user code with stores and indirection is exactly what I do > want to avoid. 1. The new functions rely on `ident_loc`. We had to add those new functions because the old ones did not use it and it was bad design decision. Now we need to fix this. I suggest you do everything right from the very beginning rather than fixing this later by adding extra entry points to support OMPT/OMPD or something else, for example. 2. No, you cannot simply change the interface of the library to keep the compatibility with the previous versions of the compiler/library. You will need to add the new entries. ================ Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70 +//////////////////////////////////////////////////////////////////////////////// +__device__ __shared__ target_region_shared_buffer _target_region_shared_memory; + ---------------- jdoerfert wrote: > ABataev wrote: > > It would be good to store it the global memory rather than in the shared to > > save th shared memory. Also, we already are using several shared memory > > buffers for different purposes, it would be good to merge them somehow to > > reduce pressure on shared memory. > I would have reused your buffer but it is for reasons unclear to me, not a > byte-wise buffer but an array of `void *` and also used as such. Using it as > a byte-wise buffer might cause problems or at least confusion. Changing it to > a byte-wise buffer would be fine with me. I don't need a separate buffer but > just one with the functionality implemented in this one. I don't know what `my` buffer are talking about. I'm just saying that we already using a lot of shared memory and adding another one shared memory buffer of ~150 bytes per team increases pressure on the shared memory. It would be good to reuse the existing buffers somehow. It was just a suggestion. ================ Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:64 + +/// Filter threads into masters and workers. If \p UseStateMachine is true, +/// required workers will enter a state machine through and be trapped there. ---------------- jdoerfert wrote: > ABataev wrote: > > What is the criteria for `UseStateMachine`? Under what conditions it can be > > set to `true` and `false`? Also, what if have several parallel regions in > > non-SPMD kernel and `UseStateMachine` is `true`? > > What is the criteria for UseStateMachine? Under what conditions it can be > > set to true and false? > > `UseStateMachine` is an option exposed to the outer world through the > `__kmpc_target_region_kernel_init` call. The semantics are explained here and > in the declaration of `__kmpc_target_region_kernel_init`. > > > Also, what if have several parallel regions in non-SPMD kernel and > > UseStateMachine is true? > > I don't see the problem, I expect all kernels having threads in their own > state machine and no interference between them. That is at least what should > happen. Maybe I miss something. Do you see a problem? > 1. I see its semantics, I'm asking when it must be set to `true` and when to `false`. Maybe I missed something, but currently, it is always set to `true` in the compiler patch. Do you really need it? 2. What if you have a single kernel with several consecutive parallel regions? Can you handle this? ================ Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_region.cu:166 + // Copy the shared and private variables into shared memory. + char *SVMemory = __kmpc_target_region_kernel_get_shared_memory(); + char *PVMemory = __kmpc_target_region_kernel_get_private_memory(); ---------------- Use `void *` also, better to keep the same coding style across the whole library Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59319/new/ https://reviews.llvm.org/D59319 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits