ABataev added inline comments.
================ Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73 +/// Note: Only the team master is allowed to call non-const functions! +struct shared_bytes_buffer { + ---------------- jdoerfert wrote: > > What is this buffer used for? Transferring pointers to the shread variables > > to the parallel regions? If so, it must be handled by the compiler. There > > are several reasons to do this: > > 1) You're using malloc/free functions for large buffers. The fact is that > > the size of this buffer is known at the compile time and compiler can > > generate the fixed size buffer in the global memory if required. We already > > have similar implementation for target regions, globalized variables etc. > > You can take a look and adapt it for your purpose. > > 2) Malloc/free are not very fast on the GPU, so it will get an additional > > performance with the preallocated buffers. > > 3) Another one problem with malloc/free is that they are using preallocated > > memory and the size of this memory is limited by 8Mb (if I do recall > > correctly). This memory is required for the correct support of the local > > variables globalization and we alredy ran into the situation when malloc > > could not allocate enough memory for it with some previous implementations. > > 4) You can reused the shared memory buffers already generated by the > > compiler and save shared memory. > > [Quote by ABataev copied from > https://reviews.llvm.org/D59319?id=190767#inline-525900 after the patch was > split.] > > > This buffer is supposed to be used to communicate variables in shared and > firstprivate clauses between threads in a team. In this patch it is simply > used to implement the old `void**` buffer. How, when, if we use it is part of > the interface implementation. For now, this buffer simply serves the users of > the `omptarget_nvptx_globalArgs` global. > > If you want to provide compiler allocated memory to avoid the buffer use, no > problem, > the `__kmpc_target_region_kernel_parallel` function allows to do so, see the > `SharedMemPointers` flag. I wouldn't want to put the logic to generate these > buffers in the front-end though. Why? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D59424/new/ https://reviews.llvm.org/D59424 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits