jdoerfert marked 3 inline comments as done. jdoerfert added inline comments.
================ Comment at: openmp/libomptarget/deviceRTLs/common/target_region.h:100 +/// +EXTERN int8_t __kmpc_target_region_kernel_init(bool UseSPMDMode, + bool RequiresOMPRuntime, ---------------- ABataev wrote: > jdoerfert wrote: > > ABataev wrote: > > > jdoerfert wrote: > > > > ABataev wrote: > > > > > jdoerfert wrote: > > > > > > ABataev wrote: > > > > > > > Better to use `ident_loc` for passing info about execution mode > > > > > > > and full/lightweight runtime. > > > > > > Could you please explain why you think that? Adding indirection > > > > > > through a structure does not really seem beneficial to me. > > > > > Almost all function from libomp rely on `ident_loc`. The functions, > > > > > which were added for NVPTX without this parameter had a lot of > > > > > problems later and most of them were replaced with the functions with > > > > > this parameter type. Plus, this parameter is used for OMPD/OMPT and > > > > > it may be important for future OMPD/OMPT support. > > > > > Almost all function from libomp rely on ident_loc. > > > > > > > > If you look at the implementation of this interface for NVPTX you will > > > > see that the called functions do not take `ident_loc` values. When you > > > > create the calls from the existing NVPTX code generation in clang, the > > > > current code **does not use** `ident_loc` for similar functions, see: > > > > `___kmpc_kernel_init(kmp_int32 thread_limit, int16_t > > > > RequiresOMPRuntime)`, > > > > `__kmpc_kernel_deinit(int16_t IsOMPRuntimeInitialized)`, > > > > `__kmpc_spmd_kernel_init(kmp_int32 thread_limit, int16_t > > > > RequiresOMPRuntime, int16_t RequiresDataSharing)`, > > > > `__kmpc_kernel_parallel(void **outlined_function, int16_t > > > > IsOMPRuntimeInitialized)`, > > > > ... > > > > > > > > > > > > > > > > > Plus, this parameter is used for OMPD/OMPT and it may be important > > > > > for future OMPD/OMPT support. > > > > > > > > If we at some point need to make the options permanent in an > > > > `ident_loc` we can simply pass an `ident_loc` and require it to be > > > > initialized by the call. Cluttering the user code with stores and > > > > indirection is exactly what I do want to avoid. > > > 1. The new functions rely on `ident_loc`. We had to add those new > > > functions because the old ones did not use it and it was bad design > > > decision. Now we need to fix this. I suggest you do everything right from > > > the very beginning rather than fixing this later by adding extra entry > > > points to support OMPT/OMPD or something else, for example. > > > 2. No, you cannot simply change the interface of the library to keep the > > > compatibility with the previous versions of the compiler/library. You > > > will need to add the new entries. > > Let's start this one again because I still haven't understood. Why do we > > need to populate the `ident_loc` again? What information has to be in there > > at which point? I want this to be clear because a lot of other "design > > decisions" of the existing code base are in my opinion not necessary and > > consequently missing here. That includes, for example, various global > > variables. If we have a description of the problem you try to solve with > > the `ident_loc` we might be able to find a way that cuts down on state. > > > > > > Regarding the "compatibility", this is not a stable interface people can > > rely on. Whatever is committed in this first patch __is not__ set in stone. > > Also, we can _always_ add a `__kmpc_init_ident_loc(....)` function after > > the fact. > Ident_loc holds the data about current source code location, execution mode > and is full runtime required or not. Also, it is used in OMPT/OMPD support. > Regarding "compatibility" libraries must be most stable part of the compiler, > because the user migbt need to link the old object file/library with the new > one. Because of this the new versions of libraries must be compatible with > old ones. And you need to maintain the deprecated parts to keep the > compatibility with the previous versions. All these libs already have a lot > of old code that because of the initial poor design and we need to maintain > them. I would like to avoid this situation with this patch. > Ident_loc holds the data about current source code location, execution mode > and is full runtime required or not. Also, it is used in OMPT/OMPD support. We can store that information through a `__kmpc_init_ident_loc(....)` call once needed. > Regarding "compatibility" libraries must be most stable part of the compiler, > because the user migbt need to link the old object file/library with the new > one. Because of this the new versions of libraries must be compatible with > old ones. And you need to maintain the deprecated parts to keep the > compatibility with the previous versions. All these libs already have a lot > of old code that because of the initial poor design and we need to maintain > them. I would like to avoid this situation with this patch. The way I understand you now is that you want a way to extend the interface in the future and adding a changeable `ident_loc` pointer is your proposed way. Do I understand your reaonsing for `ident_loc` here correctly or is it (this and) something else? ================ Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omp_data.cu:70 +//////////////////////////////////////////////////////////////////////////////// +__device__ __shared__ target_region_shared_buffer _target_region_shared_memory; + ---------------- ABataev wrote: > jdoerfert wrote: > > ABataev wrote: > > > jdoerfert wrote: > > > > ABataev wrote: > > > > > It would be good to store it the global memory rather than in the > > > > > shared to save th shared memory. Also, we already are using several > > > > > shared memory buffers for different purposes, it would be good to > > > > > merge them somehow to reduce pressure on shared memory. > > > > I would have reused your buffer but it is for reasons unclear to me, > > > > not a byte-wise buffer but an array of `void *` and also used as such. > > > > Using it as a byte-wise buffer might cause problems or at least > > > > confusion. Changing it to a byte-wise buffer would be fine with me. I > > > > don't need a separate buffer but just one with the functionality > > > > implemented in this one. > > > I don't know what `my` buffer are talking about. I'm just saying that we > > > already using a lot of shared memory and adding another one shared memory > > > buffer of ~150 bytes per team increases pressure on the shared memory. It > > > would be good to reuse the existing buffers somehow. It was just a > > > suggestion. > > > I don't know what my buffer are talking about. > > > > Sorry, my bad. The one you see in the (last part of the) implementation > > below in the beginning of the shown lines of > > `openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h`. It is called > > `omptarget_nvptx_SharedArgs` and it does (a subset of) what this new buffer > > does, providing space for shared variables in parallel regions. > > > > > I'm just saying that we already using a lot of shared memory and adding > > > another one shared memory buffer of ~150 bytes per team increases > > > pressure on the shared memory. It would be good to reuse the existing > > > buffers somehow. It was just a suggestion. > > > > I understand and I agree. My comment explained why I didn't do that in the > > first place, hoping that you see the problem and agree we should rewrite > > the users of `omptarget_nvptx_SharedArgs` to use > > `target_region_shared_buffer`[1], thereby reducing the required shared > > memory. > > > > [1] The name is subject to change! I don't care much. > > > This is not `my` buffer. Unfortunately, I did not work on this library since > the very beginning. There are some other buffers, generated by the compiler, > for example, and we can try to reuse them. > This is not my buffer. My "you" was not directed at you but a general one. The wording was bad, my apologies. > There are some other buffers, generated by the compiler, for example, and we > can try to reuse them. I'm not 100% sure which buffers you refer to here but I think that are the ones the new code generation does not emit anymore. I'm all for merging/replacing multiple buffers implemented in the device RTL, I didn't do it because it breaks compatibility or it forces me to inherit design choices I dislike (the void** buffer). From my perspective we could get rid of the existing `omptarget_nvptx_SharedArgs` space by letting it use the `target_region_shared_buffer` internally. That solves the problem for now and once `omptarget_nvptx_SharedArgs` isn't directly needed anymore it is removed. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59319/new/ https://reviews.llvm.org/D59319 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits