ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h:73
+/// Note: Only the team master is allowed to call non-const functions!
+struct shared_bytes_buffer {
+
----------------
jdoerfert wrote:
> > What is this buffer used for? Transferring pointers to the shread variables 
> > to the parallel regions? If so, it must be handled by the compiler. There 
> > are several reasons to do this:
> > 1) You're using malloc/free functions for large buffers. The fact is that 
> > the size of this buffer is known at the compile time and compiler can 
> > generate the fixed size buffer in the global memory if required. We already 
> > have similar implementation for target regions, globalized variables etc. 
> > You can take a look and adapt it for your purpose.
> > 2) Malloc/free are not very fast on the GPU, so it will get an additional 
> > performance with the preallocated buffers.
> > 3) Another one problem with malloc/free is that they are using preallocated 
> > memory and the size of this memory is limited by 8Mb (if I do recall 
> > correctly). This memory is required for the correct support of the local 
> > variables globalization and we alredy ran into the situation when malloc 
> > could not allocate enough memory for it with some previous implementations.
> > 4) You can reused the shared memory buffers already generated by the 
> > compiler and save shared memory.
> 
> [Quote by ABataev copied from 
> https://reviews.llvm.org/D59319?id=190767#inline-525900 after the patch was 
> split.]
> 
> 
> This buffer is supposed to be used to communicate variables in shared and 
> firstprivate clauses between threads in a team. In this patch it is simply 
> used to implement the old `void**` buffer. How, when, if we use it is part of 
> the interface implementation. For now, this buffer simply serves the users of 
> the `omptarget_nvptx_globalArgs` global.
> 
> If you want to provide compiler allocated memory to avoid the buffer use, no 
> problem,
> the `__kmpc_target_region_kernel_parallel` function allows to do so, see the 
> `SharedMemPointers` flag. I wouldn't want to put the logic to generate these 
> buffers in the front-end though.
Why?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59424/new/

https://reviews.llvm.org/D59424



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to